[StarCluster] Starcluster SGE usage

John St. John johnthesaintjohn at gmail.com
Tue Oct 16 18:32:47 EDT 2012


Thanks Jesse! 

This does seem to work. I don't need to define -pe in this case b/c the slots are actually limited per node.

My only problem with this solution is that all jobs are now limited to this hard coded number of slots, and also when nodes are added to the cluster while it is running the file is modified and the line would need to be edited again. On other systems I have seen the ability to specify that a job will use a specific number of CPU's without being in a special parallel environment I have seen the "-l ncpus=X" option working, but it does't seem to with the default starcluster setup. Also it looks like the "orte" parallel environment has some stuff very specific to MPI, and doesn't have a problem splitting the requested number of slots between multiple nodes, which I definitely don't want. I just want to limit the number of jobs per node, but be able to specify that at runtime.  

It looks like the grid engine is somehow aware of the number of CPU's available on each node. I get this with by running `qhost`:
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
master                  linux-x64       8  0.88   67.1G    1.5G     0.0     0.0
node001                 linux-x64       8  0.36   67.1G  917.3M     0.0     0.0
node002                 linux-x64       8  0.04   67.1G  920.4M     0.0     0.0
node003                 linux-x64       8  0.04   67.1G  887.3M     0.0     0.0
node004                 linux-x64       8  0.06   67.1G  911.4M     0.0     0.0


So it seems like there should be a way to tell qsub that job X is using some subset of the available CPU, or RAM, so that it doesn't oversubscribe the node.

Thanks for your time!

Best,
John





On Oct 16, 2012, at 2:12 PM, Jesse Lu <jesselu at stanford.edu> wrote:

> You can modify the all.q queue to assign a fixed number of slots to each node.
> If I remember correctly, "$ qconf -mq all.q" will bring up the configuration of the all.q queue in an editor. 
> Under the "slots" attribute should be a semilengthly string such as "[node001=16],[node002=16],..."
> Try replacing the entire string with a single number such as "2". This should assign each host to have only two slots.
> Save the configuration and try a simple submission with the 'orte' parallel environment and let me know if it works.
> Jesse
> 
> On Tue, Oct 16, 2012 at 1:37 PM, John St. John <johnthesaintjohn at gmail.com> wrote:
> Hello,
> I am having issues telling qsub to limit the number of jobs ran at any one time on each node of the cluster. There are sometimes ways to do this with things like "qsub -l node=1:ppn=1" or "qsub -l procs=2" or something. I even tried "qsub -l slots=2" but that gave me an error and told me to use the parallel environment. When I tried to use the "orte" parallel environment like "-pe orte 2" I see "slots=2" in my qstat list, but everything gets executed on one node at the same parallelization as before. How do I limit the number of jobs per node? I am running a process that consumes a very large amount of ram.
> 
> Thanks,
> John
> 
> 
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20121016/bcf244f2/attachment.htm


More information about the StarCluster mailing list