[StarCluster] loadbalance

Ryan Golhar ngsbioinformatics at gmail.com
Wed Sep 18 14:47:52 EDT 2013


I've since terminated the cluster and an experimenting with different set
up, but here's the output from qstat and qhost;

ec2-user at master:~$ qstat
job-ID  prior   name       user         state submit/start at     queue
                     slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
      4 0.55500 j1-00493-0 ec2-user     r     09/18/2013 17:38:44
all.q at node001                      8
      6 0.55500 j1-00508-0 ec2-user     r     09/18/2013 17:45:44
all.q at node002                      8
      7 0.55500 j1-00525-0 ec2-user     r     09/18/2013 17:46:29
all.q at node003                      8
      8 0.55500 j1-00541-0 ec2-user     r     09/18/2013 17:54:59
all.q at node004                      8
      9 0.55500 j1-00565-0 ec2-user     r     09/18/2013 17:55:44
all.q at node005                      8
     10 0.55500 j1-00596-0 ec2-user     r     09/18/2013 17:58:59
all.q at node006                      8
     11 0.55500 j1-00604-0 ec2-user     r     09/18/2013 18:05:14
all.q at node007                      8
     12 0.55500 j1-00625-0 ec2-user     r     09/18/2013 18:05:14
all.q at node008                      8
     13 0.55500 j1-00650-0 ec2-user     r     09/18/2013 18:05:14
all.q at node009                      8
     18 0.55500 j1-00734-0 ec2-user     r     09/18/2013 18:07:29
all.q at node010                      8
     19 0.55500 j1-00738-0 ec2-user     r     09/18/2013 18:16:59
all.q at node011                      8
     20 0.55500 j1-00739-0 ec2-user     r     09/18/2013 18:16:59
all.q at node012                      8
     21 0.55500 j1-00770   ec2-user     r     09/18/2013 18:16:59
all.q at node013                      8
     22 0.55500 j1-00806-0 ec2-user     r     09/18/2013 18:16:59
all.q at node014                      8
     23 0.55500 j1-00825-0 ec2-user     r     09/18/2013 18:16:59
all.q at node015                      8
     24 0.55500 j1-00826-0 ec2-user     r     09/18/2013 18:16:59
all.q at node016                      8
     25 0.55500 j1-00846-0 ec2-user     r     09/18/2013 18:16:59
all.q at node017                      8
     26 0.55500 j1-00847-0 ec2-user     r     09/18/2013 18:16:59
all.q at node018                      8
     27 0.55500 j1-00913   ec2-user     r     09/18/2013 18:16:59
all.q at node019                      8
     28 0.55500 j1-00914-0 ec2-user     r     09/18/2013 18:16:59
all.q at node020                      8
     29 0.55500 j1-00914   ec2-user     r     09/18/2013 18:26:29
all.q at node021                      8
     30 0.55500 j1-00922   ec2-user     r     09/18/2013 18:26:29
all.q at node022                      8
     31 0.55500 j1-00977   ec2-user     r     09/18/2013 18:26:29
all.q at node023                      8
     32 0.55500 j1-00984-0 ec2-user     r     09/18/2013 18:26:29
all.q at node024                      8
     33 0.55500 j1-00984   ec2-user     r     09/18/2013 18:26:29
all.q at node025                      8
     34 0.55500 j1-00998-0 ec2-user     r     09/18/2013 18:26:29
all.q at node026                      8
     35 0.55500 j1-01010-0 ec2-user     r     09/18/2013 18:26:29
all.q at node027                      8
     36 0.55500 j1-01019-0 ec2-user     r     09/18/2013 18:26:29
all.q at node028                      8
     37 0.55500 j1-01025-0 ec2-user     r     09/18/2013 18:26:29
all.q at node029                      8
     38 0.55500 j1-01026-0 ec2-user     r     09/18/2013 18:26:29
all.q at node030                      8

ec2-user at master:~$ qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO
 SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -
  -
node001                 linux-x64       8  7.74    6.8G    3.8G     0.0
0.0
node002                 linux-x64       8  7.93    6.8G    3.7G     0.0
0.0
node003                 linux-x64       8  7.68    6.8G    3.7G     0.0
0.0
node004                 linux-x64       8  7.86    6.8G    3.8G     0.0
0.0
node005                 linux-x64       8  7.87    6.8G    3.7G     0.0
0.0
node006                 linux-x64       8  7.66    6.8G    3.7G     0.0
0.0
node007                 linux-x64       8  0.01    6.8G  564.8M     0.0
0.0
node008                 linux-x64       8  0.01    6.8G  493.6M     0.0
0.0
node009                 linux-x64       8  0.02    6.8G  564.4M     0.0
0.0
node010                 linux-x64       8  7.85    6.8G    3.7G     0.0
0.0
node011                 linux-x64       8  7.53    6.8G    3.7G     0.0
0.0
node012                 linux-x64       8  7.57    6.8G    3.6G     0.0
0.0
node013                 linux-x64       8  7.71    6.8G    3.7G     0.0
0.0
node014                 linux-x64       8  7.49    6.8G    3.7G     0.0
0.0
node015                 linux-x64       8  7.51    6.8G    3.7G     0.0
0.0
node016                 linux-x64       8  7.50    6.8G    3.6G     0.0
0.0
node017                 linux-x64       8  7.89    6.8G    3.7G     0.0
0.0
node018                 linux-x64       8  7.50    6.8G    3.7G     0.0
0.0
node019                 linux-x64       8  7.52    6.8G    3.7G     0.0
0.0
node020                 linux-x64       8  7.68    6.8G    3.6G     0.0
0.0
node021                 linux-x64       8  7.16    6.8G    3.6G     0.0
0.0
node022                 linux-x64       8  6.99    6.8G    3.6G     0.0
0.0
node023                 linux-x64       8  6.80    6.8G    3.6G     0.0
0.0
node024                 linux-x64       8  7.20    6.8G    3.6G     0.0
0.0
node025                 linux-x64       8  6.86    6.8G    3.6G     0.0
0.0
node026                 linux-x64       8  7.24    6.8G    3.6G     0.0
0.0
node027                 linux-x64       8  6.88    6.8G    3.7G     0.0
0.0
node028                 linux-x64       8  6.28    6.8G    3.6G     0.0
0.0
node029                 linux-x64       8  7.42    6.8G    3.6G     0.0
0.0
node030                 linux-x64       8  0.10    6.8G  390.4M     0.0
0.0
node031                 linux-x64       8  0.06    6.8G  135.0M     0.0
0.0
node032                 linux-x64       8  0.04    6.8G  135.3M     0.0
0.0
node033                 linux-x64       8  0.07    6.8G  135.6M     0.0
0.0
node034                 linux-x64       8  0.10    6.8G  134.9M     0.0
0.0


I never saw anything unusual


On Wed, Sep 18, 2013 at 10:40 AM, Rajat Banerjee <rajatb at post.harvard.edu>wrote:

> Ryan,
> Could you put the output of qhost and qstat into a text file and send it
> back to the list? That's what feeds the load balancer those stats.
>
> Thanks,
> Rajat
>
>
> On Tue, Sep 17, 2013 at 11:47 PM, Ryan Golhar <ngsbioinformatics at gmail.com
> > wrote:
>
>> I'm running a cluster with over 800 jobs queued....and I'm running
>> loadbalance.  Every other query by loadbalance shows Avg job duration and
>> wait time of 0 secs.  Why is this?  It hasn't yet caused a problem, but
>> seems odd....
>>
>> >>> Loading full job history
>> Execution hosts: 19
>> Queued jobs: 791
>> Oldest queued job: 2013-09-17 22:19:23
>> Avg job duration: 3559 secs
>> Avg job wait time: 12389 secs
>> Last cluster modification time: 2013-09-18 00:11:31
>> >>> Not adding nodes: already at or above maximum (1)
>> >>> Sleeping...(looping again in 60 secs)
>>
>> Execution hosts: 19
>> Queued jobs: 791
>> Oldest queued job: 2013-09-17 22:19:23
>> Avg job duration: 0 secs
>> Avg job wait time: 0 secs
>> Last cluster modification time: 2013-09-18 00:11:31
>> >>> Not adding nodes: already at or above maximum (1)
>> >>> Sleeping...(looping again in 60 secs)
>>
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20130918/aba5ef81/attachment.htm


More information about the StarCluster mailing list