[StarCluster] queue list doesn't match /etc/hosts?

David Koppstein david.koppstein at gmail.com
Fri Jun 26 13:20:41 EDT 2015


Hi,

I recently noticed that my instance of starcluster stopped submitting jobs.
I had disabled jobs on the master node using `/opt/sge6/bin/linux-x64/qmod
-d all.q at master`, but the jobs were submitting to other nodes fine until
recently.

Interestingly, my output of `qstat -f`:

queuename                      qtype resv/used/tot. load_avg arch
 states
---------------------------------------------------------------------------------
all.q at master                   BIP   0/0/8          1.01     linux-x64     d

So there's only one queue available (which is disabled). However, in
/etc/hosts, I see

10.0.0.85 master
10.0.0.80 node018
10.0.0.124 node025
10.0.0.139 node039

So for some reason, the queues for these other nodes aren't registered even
though the nodes exist and are associated with starcluster when I do, for
example, `starcluster listclusters`:

Cluster nodes:
     master running i-3d8a8cc2 52.7.83.124
    node018 running i-4331d5bd 52.0.84.150
    node025 running i-e9e10717 52.6.226.185
    node039 running i-e11afc1f 54.175.131.15
Total nodes: 4

I'm also running a load balancer on the cluster if that's relevant. Have
any of you seen this or know what might cause this?

Cheers,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20150626/ba5e6bb9/attachment.htm


More information about the StarCluster mailing list