<div dir="ltr"><div style>Hi fellows,</div><div style><br></div><div style>Started to test the SC´s LoadBalancer but something is not working well. The LoadBalancer tells me that there´s no jobs in the OGE´s queue. </div><div style>
Here comes all history:</div><div style><br></div><div style>1-Launched a 5-node cluster (Ubuntu HVM) cc1.x4large</div><div style>-> Used mpich2 plugin (mpich2 v1.4.1 native)</div><div style><br></div><div style>2- Submitted an application job to OGE (mympiapp):</div>
<div style><div>$ qsub -N Newaveinth -b y -pe orte 80 -cwd mpiexec -n 80 mympiapp</div><div><div><br></div><div>3- Checked the queue:<br></div><div><br></div><div>job-ID prior name user state submit/start at queue slots ja-task-ID</div>
<div>-----------------------------------------------------------------------------------------------------------------</div><div> 2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 all.q@master 80</div>
<div>sgeadmin@master:~/pmo0113$ qstat -f</div><div>queuename qtype resv/used/tot. load_avg arch states</div><div>---------------------------------------------------------------------------------</div>
<div>all.q@master BIP 0/16/16 3.06 linux-x64</div><div> 2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16</div><div>---------------------------------------------------------------------------------</div>
<div>all.q@node001 BIP 0/16/16 2.87 linux-x64</div><div> 2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16</div><div>---------------------------------------------------------------------------------</div>
<div> BIP 0/16/16 2.74 linux-x64<br></div><div> 2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16</div><div>---------------------------------------------------------------------------------</div>
<div>all.q@node003 BIP 0/16/16 2.86 linux-x64</div><div> 2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16</div><div>---------------------------------------------------------------------------------</div>
<div>all.q@node004 BIP 0/16/16 2.11 linux-x64</div><div> 2 0.55500 Newaveinth sgeadmin r 01/23/2013 13:29:48 16</div></div><div><br></div></div><div style><div style>4- Tried to Load Balance the cluster launcedd with 5 nodes</div>
<div style><br></div><div style><div><div>ubuntu@ip-10-112-98-159:~$ starcluster loadbalance mycluster --max_nodes=6</div><div>StarCluster - (<a href="http://web.mit.edu/starcluster">http://web.mit.edu/starcluster</a>) (v. 0.93.3)</div>
<div>Software Tools for Academics and Researchers (STAR)</div><div>Please submit bug reports to <a href="mailto:starcluster@mit.edu">starcluster@mit.edu</a></div><div><br></div><div>>>> Starting load balancer (Use ctrl-c to exit)</div>
<div>Maximum cluster size: 6</div><div>Minimum cluster size: 1</div><div>Cluster growth rate: 1 nodes/iteration</div><div><br></div><div>>>> Loading full job history</div><div>Execution hosts: 5</div><div>Queued jobs: 0</div>
<div>Avg job duration: 1699 secs</div><div>Avg job wait time: 8 secs</div><div>Last cluster modification time: 2013-01-23 13:32:33</div><div>>>> Cluster was modified less than 180 seconds ago</div><div>>>> Waiting for cluster to stabilize...</div>
<div>>>> Sleeping...(looping again in 60 secs)</div><div><br></div><div>>>> Loading full job history</div><div>Execution hosts: 5</div><div>Queued jobs: 0 (<-- Jobs equals a 0 ???)</div><div>Avg job duration: 1699 secs</div>
<div>Avg job wait time: 8 secs</div><div>Last cluster modification time: 2013-01-23 13:32:33</div><div>>>> Cluster was modified less than 180 seconds ago</div><div>>>> Waiting for cluster to stabilize...</div>
<div>>>> Sleeping...(looping again in 60 secs)</div><div><br></div><div>^C (<-- Ctrl-C and a lot of messages...)</div><div>Traceback (most recent call last):</div><div> File "/usr/local/bin/starcluster", line 9, in <module></div>
<div> load_entry_point('StarCluster==0.93.3', 'console_scripts', 'starcluster')()</div><div> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/cli.py", line 312, in main</div>
<div> StarClusterCLI().main()</div><div> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/cli.py", line 255, in main</div><div> sc.execute(args)</div><div> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/commands/loadbalance.py", line 90, in execute</div>
<div> lb.run(cluster)</div><div> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.93.3-py2.7.egg/starcluster/balancers/sge/__init__.py", line 619, in run</div><div> time.sleep(self.polling_interval)</div>
<div>KeyboardInterrupt</div><div>Exception in thread Thread-1 (most likely raised during interpreter shutdown):</div><div>Traceback (most recent call last):</div><div> File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner</div>
<div> File "/usr/local/lib/python2.7/dist-packages/ssh-1.7.13-py2.7.egg/ssh/transport.py", line 1602, in run</div><div><type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'error'@node002</div>
</div><div style><br></div><div style>All the best,</div><div style><br></div><div style>Sergio</div></div></div></div>