[StarCluster] Starcluster SGE usage

Wed Oct 17 17:23:12 EDT 2012

Hi Gavin,
Thanks for pointing me in the right direction. I found a great solution though that seems to work really well. Since the "slots" is already set up to be equal to the core count on each node, I just needed access to a parallel environment that allowed me to submit jobs to nodes, but request a certain number of slots on a single node rather than spread out across N nodes. Changing the allocation rule to "fill" would probably still overflow into multiple nodes at the edge case. The way to do this properly is with the $pe_slots allocation rule in the parallel environment config file. Here is what I did:

qconf -sp by_node (create this with qconf -ap [name])

pe_name            by_node
slots              9999999
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     TRUE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE

Then I modify the parallel environment list in all.q:
qconf -mq all.q
pe_list               make orte by_node

That does it! Wahoo!

Ok now the problem is that I want this done automatically whenever a cluster is booted up, and if a node is added I want to make sure these configurations aren't clobbered. Any suggestions on making that happen?

Thanks everyone for your time!

Best,
John

On Oct 17, 2012, at 8:16 AM, Gavin W. Burris <bug at sas.upenn.edu> wrote:

> Hi John,
> 
> The default configuration will distribute jobs based on load, meaning
> new jobs land on the least loaded node.  If you want to fill nodes, you
> can change the load formula on the scheduler config:
> # qconf -msconf
> load_formula    slots
> 
> If you are using a parallel environment, the default can be changed to
> fill a node, as well:
> # qconf -mp orte
> allocation_rule    $fill_up
> 
> You may want to consider making memory consumable to prevent
> over-subscription.  An easy option may be to make an arbitrary
> consumable complex resource, say john_jobs, and set it to the max number
> you want running at one time:
> # qconf -mc
> john_jobs jj INT <= YES YES 0 0
> # qconf -me global
> complex_values john_jobs=10
> 
> Then, when you submit a job, specify the resource:
> $ qsub -l jj=1 ajob.sh
> 
> Each job submitted in this way will consume one count of john_jobs,
> effectively limiting you to ten.
> 
> Cheers.
> 
> 
> On 10/16/2012 06:32 PM, John St. John wrote:
>> Thanks Jesse! 
>> 
>> This does seem to work. I don't need to define -pe in this case b/c the
>> slots are actually limited per node.
>> 
>> My only problem with this solution is that all jobs are now limited to
>> this hard coded number of slots, and also when nodes are added to the
>> cluster while it is running the file is modified and the line would need
>> to be edited again. On other systems I have seen the ability to specify
>> that a job will use a specific number of CPU's without being in a
>> special parallel environment I have seen the "-l ncpus=X" option
>> working, but it does't seem to with the default starcluster setup. Also
>> it looks like the "orte" parallel environment has some stuff very
>> specific to MPI, and doesn't have a problem splitting the requested
>> number of slots between multiple nodes, which I definitely don't want. I
>> just want to limit the number of jobs per node, but be able to specify
>> that at runtime.  
>> 
>> It looks like the grid engine is somehow aware of the number of CPU's
>> available on each node. I get this with by running `qhost`:
>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO
>> SWAPUS
>> -------------------------------------------------------------------------------
>> global                  -               -     -       -       -       -
>>      -
>> master                  linux-x64       8  0.88   67.1G    1.5G     0.0
>>    0.0
>> node001                 linux-x64       8  0.36   67.1G  917.3M     0.0
>>    0.0
>> node002                 linux-x64       8  0.04   67.1G  920.4M     0.0
>>    0.0
>> node003                 linux-x64       8  0.04   67.1G  887.3M     0.0
>>    0.0
>> node004                 linux-x64       8  0.06   67.1G  911.4M     0.0
>>    0.0
>> 
>> 
>> So it seems like there should be a way to tell qsub that job X is using
>> some subset of the available CPU, or RAM, so that it doesn't
>> oversubscribe the node.
>> 
>> Thanks for your time!
>> 
>> Best,
>> John
>> 
>> 
>> 
>> 
>> 
>> On Oct 16, 2012, at 2:12 PM, Jesse Lu <jesselu at stanford.edu
>> <mailto:jesselu at stanford.edu>> wrote:
>> 
>>> You can modify the all.q queue to assign a fixed number of slots to
>>> each node.
>>> 
>>>  * If I remember correctly, "$ qconf -mq all.q" will bring up the
>>>    configuration of the all.q queue in an editor. 
>>>  * Under the "slots" attribute should be a semilengthly string such
>>>    as "[node001=16],[node002=16],..."
>>>  * Try replacing the entire string with a single number such as "2".
>>>    This should assign each host to have only two slots.
>>>  * Save the configuration and try a simple submission with the 'orte'
>>>    parallel environment and let me know if it works.
>>> 
>>> Jesse
>>> 
>>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John
>>> <johnthesaintjohn at gmail.com <mailto:johnthesaintjohn at gmail.com>> wrote:
>>> 
>>>    Hello,
>>>    I am having issues telling qsub to limit the number of jobs ran at
>>>    any one time on each node of the cluster. There are sometimes ways
>>>    to do this with things like "qsub -l node=1:ppn=1" or "qsub -l
>>>    procs=2" or something. I even tried "qsub -l slots=2" but that
>>>    gave me an error and told me to use the parallel environment. When
>>>    I tried to use the "orte" parallel environment like "-pe orte 2" I
>>>    see "slots=2" in my qstat list, but everything gets executed on
>>>    one node at the same parallelization as before. How do I limit the
>>>    number of jobs per node? I am running a process that consumes a
>>>    very large amount of ram.
>>> 
>>>    Thanks,
>>>    John
>>> 
>>> 
>>>    _______________________________________________
>>>    StarCluster mailing list
>>>    StarCluster at mit.edu <mailto:StarCluster at mit.edu>
>>>    http://mailman.mit.edu/mailman/listinfo/starcluster
>>> 
>>> 
>> 
>> 
>> 
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>> 
> 
> -- 
> Gavin W. Burris
> Senior Systems Programmer
> Information Security and Unix Systems
> School of Arts and Sciences
> University of Pennsylvania