[StarCluster] workers go idle until a new worker is added ... ?

Mike Cariaso mike.cariaso at keygene.com
Mon Aug 22 11:13:40 EDT 2016


using the latest version from

https://github.com/datacratic<https://github.com/datacratic/StarCluster/blob/vanilla_improvements/starcluster/plugins/sge.py>


I start a master node, and zero workers, and put an array job into the queue. I then then gradually add workers nodes. A new worker accepts as many tasks as the slots allow, but  after they complete it never picks up additional work. When I add a new worker machine, it accepts some tasks  and runs them successfully, but never goes back for more. Usually during this time one of the idle previous machines will also pickup some more tasks, but once those are finished it again sits waiting.


qstat -j 1.19 shows me 'unable to find job file "/opt/sge6/default/spool/exec_spool_local/mynew1-node002/job_scripts/1"'


and it's true that no file is there. When I add a new machine, the job appears, suggesting this isn't a file permission issue.


some nodes remain out of action.

starcluster addnode -x -a nodename clustername

doesn't seem to help.



Michael Cariaso
<mailto:michael.cariaso at keygene.com>
Bioinformatician<http://www.keygene.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20160822/b1556f25/attachment.html


More information about the StarCluster mailing list