[StarCluster] Starcluster SGE usage

Gavin W. Burris bug at sas.upenn.edu
Wed Oct 17 11:16:15 EDT 2012


Hi John,

The default configuration will distribute jobs based on load, meaning
new jobs land on the least loaded node.  If you want to fill nodes, you
can change the load formula on the scheduler config:
# qconf -msconf
load_formula    slots

If you are using a parallel environment, the default can be changed to
fill a node, as well:
# qconf -mp orte
allocation_rule    $fill_up

You may want to consider making memory consumable to prevent
over-subscription.  An easy option may be to make an arbitrary
consumable complex resource, say john_jobs, and set it to the max number
you want running at one time:
# qconf -mc
john_jobs jj INT <= YES YES 0 0
# qconf -me global
complex_values john_jobs=10

Then, when you submit a job, specify the resource:
$ qsub -l jj=1 ajob.sh

Each job submitted in this way will consume one count of john_jobs,
effectively limiting you to ten.

Cheers.


On 10/16/2012 06:32 PM, John St. John wrote:
> Thanks Jesse! 
> 
> This does seem to work. I don't need to define -pe in this case b/c the
> slots are actually limited per node.
> 
> My only problem with this solution is that all jobs are now limited to
> this hard coded number of slots, and also when nodes are added to the
> cluster while it is running the file is modified and the line would need
> to be edited again. On other systems I have seen the ability to specify
> that a job will use a specific number of CPU's without being in a
> special parallel environment I have seen the "-l ncpus=X" option
> working, but it does't seem to with the default starcluster setup. Also
> it looks like the "orte" parallel environment has some stuff very
> specific to MPI, and doesn't have a problem splitting the requested
> number of slots between multiple nodes, which I definitely don't want. I
> just want to limit the number of jobs per node, but be able to specify
> that at runtime.  
> 
> It looks like the grid engine is somehow aware of the number of CPU's
> available on each node. I get this with by running `qhost`:
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO
>  SWAPUS
> -------------------------------------------------------------------------------
> global                  -               -     -       -       -       -
>       -
> master                  linux-x64       8  0.88   67.1G    1.5G     0.0
>     0.0
> node001                 linux-x64       8  0.36   67.1G  917.3M     0.0
>     0.0
> node002                 linux-x64       8  0.04   67.1G  920.4M     0.0
>     0.0
> node003                 linux-x64       8  0.04   67.1G  887.3M     0.0
>     0.0
> node004                 linux-x64       8  0.06   67.1G  911.4M     0.0
>     0.0
> 
> 
> So it seems like there should be a way to tell qsub that job X is using
> some subset of the available CPU, or RAM, so that it doesn't
> oversubscribe the node.
> 
> Thanks for your time!
> 
> Best,
> John
> 
> 
> 
> 
> 
> On Oct 16, 2012, at 2:12 PM, Jesse Lu <jesselu at stanford.edu
> <mailto:jesselu at stanford.edu>> wrote:
> 
>> You can modify the all.q queue to assign a fixed number of slots to
>> each node.
>>
>>   * If I remember correctly, "$ qconf -mq all.q" will bring up the
>>     configuration of the all.q queue in an editor. 
>>   * Under the "slots" attribute should be a semilengthly string such
>>     as "[node001=16],[node002=16],..."
>>   * Try replacing the entire string with a single number such as "2".
>>     This should assign each host to have only two slots.
>>   * Save the configuration and try a simple submission with the 'orte'
>>     parallel environment and let me know if it works.
>>
>> Jesse
>>
>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John
>> <johnthesaintjohn at gmail.com <mailto:johnthesaintjohn at gmail.com>> wrote:
>>
>>     Hello,
>>     I am having issues telling qsub to limit the number of jobs ran at
>>     any one time on each node of the cluster. There are sometimes ways
>>     to do this with things like "qsub -l node=1:ppn=1" or "qsub -l
>>     procs=2" or something. I even tried "qsub -l slots=2" but that
>>     gave me an error and told me to use the parallel environment. When
>>     I tried to use the "orte" parallel environment like "-pe orte 2" I
>>     see "slots=2" in my qstat list, but everything gets executed on
>>     one node at the same parallelization as before. How do I limit the
>>     number of jobs per node? I am running a process that consumes a
>>     very large amount of ram.
>>
>>     Thanks,
>>     John
>>
>>
>>     _______________________________________________
>>     StarCluster mailing list
>>     StarCluster at mit.edu <mailto:StarCluster at mit.edu>
>>     http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
> 
> 
> 
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
> 

-- 
Gavin W. Burris
Senior Systems Programmer
Information Security and Unix Systems
School of Arts and Sciences
University of Pennsylvania


More information about the StarCluster mailing list