[StarCluster] cluster with 30 nodes, 2 slots each, 50 job max?

Justin Riley jtriley at MIT.EDU
Mon Oct 22 10:50:31 EDT 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi John,

You can look through your debug log to see if there were errors adding
nodes to SGE: $HOME/.starcluster/logs/debug.log. Also when this
happens check to see if the nodes are listed in the qconf for all.q:

$ starcluster sshmaster mycluster
$ qconf -sq all.q

If you can still get this info I'd like to take a look to see what
happened.

~Justin

On 10/18/2012 06:55 PM, John St. John wrote:
> The issue appears to be that the nodes were added with
> `starcluster loadbalance` and for some reason SGE was not
> configured properly for one round of 5 nodes that were added. They
> show up as being part of the cluster in starcluster, but SGE does
> not recognize them. Wonder what went wrong? On Oct 18, 2012, at
> 3:25 PM, John St. John <johnthesaintjohn at gmail.com 
> <mailto:johnthesaintjohn at gmail.com>> wrote:
> 
>> Hello, I am running a cluster with 30 nodes that have 2 slots
>> each. That should give me up to 60 1 slot jobs that can run at a
>> time. For some reason though after 50 jobs are running, the
>> system just queues jobs. I have tried kicking off simple sleep
>> jobs to see if those can run over the 50 wall I am hitting, but
>> no luck (qsub -V -b y -cwd sleep 10). I do not see anything odd
>> about my settings. Is this something hard-coded somewhere that I
>> can change? Has anyone been able to run more than 50 jobs at a
>> time with starcluster?
>> 
>> Thanks! John
> 
> 
> 
> _______________________________________________ StarCluster mailing
> list StarCluster at mit.edu 
> http://mailman.mit.edu/mailman/listinfo/starcluster
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCFXTYACgkQ4llAkMfDcrlI5gCdEOd20CNzrJTDUHnG6ig+cf97
HKYAoIf0v1JdfYjclL2quZS1jRrfsqdV
=QU07
-----END PGP SIGNATURE-----


More information about the StarCluster mailing list