<div dir="ltr">That looks normal. Can you please send qstat, qacct, and qhost output from when you're seeing the problem? Thanks.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Sep 18, 2013 at 2:47 PM, Ryan Golhar <span dir="ltr"><<a href="mailto:ngsbioinformatics@gmail.com" target="_blank">ngsbioinformatics@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I've since terminated the cluster and an experimenting with different set up, but here's the output from qstat and qhost;<div>
<br></div><div><div>ec2-user@master:~$ qstat</div><div>job-ID prior name user state submit/start at queue slots ja-task-ID </div>
<div>-----------------------------------------------------------------------------------------------------------------</div><div> 4 0.55500 j1-00493-0 ec2-user r 09/18/2013 17:38:44 all.q@node001 8 </div>
<div> 6 0.55500 j1-00508-0 ec2-user r 09/18/2013 17:45:44 all.q@node002 8 </div><div> 7 0.55500 j1-00525-0 ec2-user r 09/18/2013 17:46:29 all.q@node003 8 </div>
<div> 8 0.55500 j1-00541-0 ec2-user r 09/18/2013 17:54:59 all.q@node004 8 </div><div> 9 0.55500 j1-00565-0 ec2-user r 09/18/2013 17:55:44 all.q@node005 8 </div>
<div> 10 0.55500 j1-00596-0 ec2-user r 09/18/2013 17:58:59 all.q@node006 8 </div><div> 11 0.55500 j1-00604-0 ec2-user r 09/18/2013 18:05:14 all.q@node007 8 </div>
<div> 12 0.55500 j1-00625-0 ec2-user r 09/18/2013 18:05:14 all.q@node008 8 </div><div> 13 0.55500 j1-00650-0 ec2-user r 09/18/2013 18:05:14 all.q@node009 8 </div>
<div> 18 0.55500 j1-00734-0 ec2-user r 09/18/2013 18:07:29 all.q@node010 8 </div><div> 19 0.55500 j1-00738-0 ec2-user r 09/18/2013 18:16:59 all.q@node011 8 </div>
<div> 20 0.55500 j1-00739-0 ec2-user r 09/18/2013 18:16:59 all.q@node012 8 </div><div> 21 0.55500 j1-00770 ec2-user r 09/18/2013 18:16:59 all.q@node013 8 </div>
<div> 22 0.55500 j1-00806-0 ec2-user r 09/18/2013 18:16:59 all.q@node014 8 </div><div> 23 0.55500 j1-00825-0 ec2-user r 09/18/2013 18:16:59 all.q@node015 8 </div>
<div> 24 0.55500 j1-00826-0 ec2-user r 09/18/2013 18:16:59 all.q@node016 8 </div><div> 25 0.55500 j1-00846-0 ec2-user r 09/18/2013 18:16:59 all.q@node017 8 </div>
<div> 26 0.55500 j1-00847-0 ec2-user r 09/18/2013 18:16:59 all.q@node018 8 </div><div> 27 0.55500 j1-00913 ec2-user r 09/18/2013 18:16:59 all.q@node019 8 </div>
<div> 28 0.55500 j1-00914-0 ec2-user r 09/18/2013 18:16:59 all.q@node020 8 </div><div> 29 0.55500 j1-00914 ec2-user r 09/18/2013 18:26:29 all.q@node021 8 </div>
<div> 30 0.55500 j1-00922 ec2-user r 09/18/2013 18:26:29 all.q@node022 8 </div><div> 31 0.55500 j1-00977 ec2-user r 09/18/2013 18:26:29 all.q@node023 8 </div>
<div> 32 0.55500 j1-00984-0 ec2-user r 09/18/2013 18:26:29 all.q@node024 8 </div><div> 33 0.55500 j1-00984 ec2-user r 09/18/2013 18:26:29 all.q@node025 8 </div>
<div> 34 0.55500 j1-00998-0 ec2-user r 09/18/2013 18:26:29 all.q@node026 8 </div><div> 35 0.55500 j1-01010-0 ec2-user r 09/18/2013 18:26:29 all.q@node027 8 </div>
<div> 36 0.55500 j1-01019-0 ec2-user r 09/18/2013 18:26:29 all.q@node028 8 </div><div> 37 0.55500 j1-01025-0 ec2-user r 09/18/2013 18:26:29 all.q@node029 8 </div>
<div> 38 0.55500 j1-01026-0 ec2-user r 09/18/2013 18:26:29 all.q@node030 8 </div></div><div><br></div><div><div>ec2-user@master:~$ qhost</div><div>HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS</div>
<div>-------------------------------------------------------------------------------</div><div>global - - - - - - -</div><div>node001 linux-x64 8 7.74 6.8G 3.8G 0.0 0.0</div>
<div>node002 linux-x64 8 7.93 6.8G 3.7G 0.0 0.0</div><div>node003 linux-x64 8 7.68 6.8G 3.7G 0.0 0.0</div><div>node004 linux-x64 8 7.86 6.8G 3.8G 0.0 0.0</div>
<div>node005 linux-x64 8 7.87 6.8G 3.7G 0.0 0.0</div><div>node006 linux-x64 8 7.66 6.8G 3.7G 0.0 0.0</div><div>node007 linux-x64 8 0.01 6.8G 564.8M 0.0 0.0</div>
<div>node008 linux-x64 8 0.01 6.8G 493.6M 0.0 0.0</div><div>node009 linux-x64 8 0.02 6.8G 564.4M 0.0 0.0</div><div>node010 linux-x64 8 7.85 6.8G 3.7G 0.0 0.0</div>
<div>node011 linux-x64 8 7.53 6.8G 3.7G 0.0 0.0</div><div>node012 linux-x64 8 7.57 6.8G 3.6G 0.0 0.0</div><div>node013 linux-x64 8 7.71 6.8G 3.7G 0.0 0.0</div>
<div>node014 linux-x64 8 7.49 6.8G 3.7G 0.0 0.0</div><div>node015 linux-x64 8 7.51 6.8G 3.7G 0.0 0.0</div><div>node016 linux-x64 8 7.50 6.8G 3.6G 0.0 0.0</div>
<div>node017 linux-x64 8 7.89 6.8G 3.7G 0.0 0.0</div><div>node018 linux-x64 8 7.50 6.8G 3.7G 0.0 0.0</div><div>node019 linux-x64 8 7.52 6.8G 3.7G 0.0 0.0</div>
<div>node020 linux-x64 8 7.68 6.8G 3.6G 0.0 0.0</div><div>node021 linux-x64 8 7.16 6.8G 3.6G 0.0 0.0</div><div>node022 linux-x64 8 6.99 6.8G 3.6G 0.0 0.0</div>
<div>node023 linux-x64 8 6.80 6.8G 3.6G 0.0 0.0</div><div>node024 linux-x64 8 7.20 6.8G 3.6G 0.0 0.0</div><div>node025 linux-x64 8 6.86 6.8G 3.6G 0.0 0.0</div>
<div>node026 linux-x64 8 7.24 6.8G 3.6G 0.0 0.0</div><div>node027 linux-x64 8 6.88 6.8G 3.7G 0.0 0.0</div><div>node028 linux-x64 8 6.28 6.8G 3.6G 0.0 0.0</div>
<div>node029 linux-x64 8 7.42 6.8G 3.6G 0.0 0.0</div><div>node030 linux-x64 8 0.10 6.8G 390.4M 0.0 0.0</div><div>node031 linux-x64 8 0.06 6.8G 135.0M 0.0 0.0</div>
<div>node032 linux-x64 8 0.04 6.8G 135.3M 0.0 0.0</div><div>node033 linux-x64 8 0.07 6.8G 135.6M 0.0 0.0</div><div>node034 linux-x64 8 0.10 6.8G 134.9M 0.0 0.0</div>
</div><div><br></div><div><br></div><div>I never saw anything unusual</div></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Sep 18, 2013 at 10:40 AM, Rajat Banerjee <span dir="ltr"><<a href="mailto:rajatb@post.harvard.edu" target="_blank">rajatb@post.harvard.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Ryan,<br>Could you put the output of qhost and qstat into a text file and send it back to the list? That's what feeds the load balancer those stats.<br>
<br></div>Thanks,<br>Rajat<br></div><div class="gmail_extra">
<br><br><div class="gmail_quote"><div><div>On Tue, Sep 17, 2013 at 11:47 PM, Ryan Golhar <span dir="ltr"><<a href="mailto:ngsbioinformatics@gmail.com" target="_blank">ngsbioinformatics@gmail.com</a>></span> wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div>
<div dir="ltr">I'm running a cluster with over 800 jobs queued....and I'm running loadbalance. Every other query by loadbalance shows Avg job duration and wait time of 0 secs. Why is this? It hasn't yet caused a problem, but seems odd....<div>
<br></div><div><div>>>> Loading full job history</div><div>Execution hosts: 19</div><div>Queued jobs: 791</div><div>Oldest queued job: 2013-09-17 22:19:23</div><div>Avg job duration: 3559 secs</div><div>Avg job wait time: 12389 secs</div>
<div>Last cluster modification time: 2013-09-18 00:11:31</div><div>>>> Not adding nodes: already at or above maximum (1)</div><div>>>> Sleeping...(looping again in 60 secs)</div><div><br></div><div>Execution hosts: 19</div>
<div>Queued jobs: 791</div><div>Oldest queued job: 2013-09-17 22:19:23</div><div>Avg job duration: 0 secs</div><div>Avg job wait time: 0 secs</div><div>Last cluster modification time: 2013-09-18 00:11:31</div><div>>>> Not adding nodes: already at or above maximum (1)</div>
<div>>>> Sleeping...(looping again in 60 secs)</div><div><br></div><div><br></div></div></div>
<br></div></div>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div><br></div>
</blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div><br></div>