[StarCluster] Starcluster SGE usage

Thu Oct 18 13:55:02 EDT 2012

Ok just submitted a pull request. I modified the sge template "sge_pe_template" so that you can modify the allocation type in a new parallel environment. I modified the SGE plugin so that it makes by_node as well as ORTE, and by_node uses the $pe_slots allocation type. I have tested this out with creating a cluster (haven't tried adding/deleting nodes) and it seems to work. The changes are pretty minimal to get here so I feel pretty confident that I didn't add any new bugs.

Best,
John

On Oct 18, 2012, at 10:07 AM, John St. John <johnthesaintjohn at gmail.com> wrote:

> Whoops, what I meant to say is that I would like to hammer something out that gets the job done. I am on IRC now in the place you suggested ( I think, never used IRC before ).
> 
> On Oct 18, 2012, at 9:04 AM, Justin Riley <jtriley at MIT.EDU> wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>> 
>> Hey Guys,
>> 
>> Glad you figured out what needed to be changed in the SGE
>> configuration. I've been meaning to add a bunch more options to the
>> SGE Plugin to configure things like this along with other SGE tuning
>> parameters for some time now but simply haven't had the time. If
>> either of you are interested in working on a PR to do this that'd be
>> awesome. All of the SGE magic is here:
>> 
>> https://github.com/jtriley/StarCluster/blob/develop/starcluster/plugins/sge.py
>> 
>> and here's the SGE install and parallel environment templates used by
>> StarCluster:
>> 
>> https://github.com/jtriley/StarCluster/blob/develop/starcluster/templates/sge.py
>> 
>> I'm happy to discuss the plugin and some of the changes that would be
>> needed on IRC (freenode: #starcluster).
>> 
>> ~Justin
>> 
>> 
>> On 10/18/2012 08:30 AM, Gavin W. Burris wrote:
>>> Hi John,
>>> 
>>> You got it.  Keeping all on the same node requires $pe_slots.  This
>>> is the same setting you would use for something like OpenMP.
>>> 
>>> As for configuring the queue automatically, maybe there is an
>>> option in the SGE plugin that we can place in the
>>> ~/.starcluster/config file?  I'd like to know, too.  If not, we
>>> could maybe add some code.  Or keep a shell script on a persistent
>>> volume that we run that does the needed qconf foo commands after
>>> starting a new head node.
>>> 
>>> Cheers.
>>> 
>>> 
>>> On 10/17/2012 05:23 PM, John St. John wrote:
>>>> Hi Gavin, Thanks for pointing me in the right direction. I found
>>>> a great solution though that seems to work really well. Since the
>>>> "slots" is already set up to be equal to the core count on each
>>>> node, I just needed access to a parallel environment that allowed
>>>> me to submit jobs to nodes, but request a certain number of slots
>>>> on a single node rather than spread out across N nodes. Changing
>>>> the allocation rule to "fill" would probably still overflow into
>>>> multiple nodes at the edge case. The way to do this properly is
>>>> with the $pe_slots allocation rule in the parallel environment
>>>> config file. Here is what I did:
>>>> 
>>>> qconf -sp by_node (create this with qconf -ap [name])
>>>> 
>>>> pe_name            by_node slots              9999999 user_lists
>>>> NONE xuser_lists        NONE start_proc_args    /bin/true 
>>>> stop_proc_args     /bin/true allocation_rule    $pe_slots 
>>>> control_slaves     TRUE job_is_first_task  TRUE urgency_slots
>>>> min accounting_summary FALSE
>>>> 
>>>> 
>>>> Then I modify the parallel environment list in all.q: qconf -mq
>>>> all.q pe_list               make orte by_node
>>>> 
>>>> That does it! Wahoo!
>>>> 
>>>> Ok now the problem is that I want this done automatically
>>>> whenever a cluster is booted up, and if a node is added I want to
>>>> make sure these configurations aren't clobbered. Any suggestions
>>>> on making that happen?
>>>> 
>>>> Thanks everyone for your time!
>>>> 
>>>> Best, John
>>>> 
>>>> 
>>>> On Oct 17, 2012, at 8:16 AM, Gavin W. Burris <bug at sas.upenn.edu>
>>>> wrote:
>>>> 
>>>>> Hi John,
>>>>> 
>>>>> The default configuration will distribute jobs based on load,
>>>>> meaning new jobs land on the least loaded node.  If you want to
>>>>> fill nodes, you can change the load formula on the scheduler
>>>>> config: # qconf -msconf load_formula    slots
>>>>> 
>>>>> If you are using a parallel environment, the default can be
>>>>> changed to fill a node, as well: # qconf -mp orte 
>>>>> allocation_rule    $fill_up
>>>>> 
>>>>> You may want to consider making memory consumable to prevent 
>>>>> over-subscription.  An easy option may be to make an arbitrary 
>>>>> consumable complex resource, say john_jobs, and set it to the
>>>>> max number you want running at one time: # qconf -mc john_jobs
>>>>> jj INT <= YES YES 0 0 # qconf -me global complex_values
>>>>> john_jobs=10
>>>>> 
>>>>> Then, when you submit a job, specify the resource: $ qsub -l
>>>>> jj=1 ajob.sh
>>>>> 
>>>>> Each job submitted in this way will consume one count of
>>>>> john_jobs, effectively limiting you to ten.
>>>>> 
>>>>> Cheers.
>>>>> 
>>>>> 
>>>>> On 10/16/2012 06:32 PM, John St. John wrote:
>>>>>> Thanks Jesse!
>>>>>> 
>>>>>> This does seem to work. I don't need to define -pe in this
>>>>>> case b/c the slots are actually limited per node.
>>>>>> 
>>>>>> My only problem with this solution is that all jobs are now
>>>>>> limited to this hard coded number of slots, and also when
>>>>>> nodes are added to the cluster while it is running the file
>>>>>> is modified and the line would need to be edited again. On
>>>>>> other systems I have seen the ability to specify that a job
>>>>>> will use a specific number of CPU's without being in a 
>>>>>> special parallel environment I have seen the "-l ncpus=X"
>>>>>> option working, but it does't seem to with the default
>>>>>> starcluster setup. Also it looks like the "orte" parallel
>>>>>> environment has some stuff very specific to MPI, and doesn't
>>>>>> have a problem splitting the requested number of slots
>>>>>> between multiple nodes, which I definitely don't want. I just
>>>>>> want to limit the number of jobs per node, but be able to
>>>>>> specify that at runtime.
>>>>>> 
>>>>>> It looks like the grid engine is somehow aware of the number
>>>>>> of CPU's available on each node. I get this with by running
>>>>>> `qhost`: HOSTNAME                ARCH         NCPU  LOAD
>>>>>> MEMTOT  MEMUSE  SWAPTO SWAPUS 
>>>>>> -------------------------------------------------------------------------------
>>>>>> 
>>>>>> 
>> global                  -               -     -       -       -       -
>>>>>> - master                  linux-x64       8  0.88   67.1G
>>>>>> 1.5G     0.0 0.0 node001                 linux-x64       8
>>>>>> 0.36   67.1G  917.3M     0.0 0.0 node002
>>>>>> linux-x64       8  0.04   67.1G  920.4M     0.0 0.0 node003
>>>>>> linux-x64       8  0.04   67.1G  887.3M     0.0 0.0 node004
>>>>>> linux-x64       8  0.06   67.1G  911.4M     0.0 0.0
>>>>>> 
>>>>>> 
>>>>>> So it seems like there should be a way to tell qsub that job
>>>>>> X is using some subset of the available CPU, or RAM, so that
>>>>>> it doesn't oversubscribe the node.
>>>>>> 
>>>>>> Thanks for your time!
>>>>>> 
>>>>>> Best, John
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Oct 16, 2012, at 2:12 PM, Jesse Lu <jesselu at stanford.edu 
>>>>>> <mailto:jesselu at stanford.edu>> wrote:
>>>>>> 
>>>>>>> You can modify the all.q queue to assign a fixed number of
>>>>>>> slots to each node.
>>>>>>> 
>>>>>>> * If I remember correctly, "$ qconf -mq all.q" will bring
>>>>>>> up the configuration of the all.q queue in an editor. *
>>>>>>> Under the "slots" attribute should be a semilengthly string
>>>>>>> such as "[node001=16],[node002=16],..." * Try replacing the
>>>>>>> entire string with a single number such as "2". This should
>>>>>>> assign each host to have only two slots. * Save the
>>>>>>> configuration and try a simple submission with the 'orte' 
>>>>>>> parallel environment and let me know if it works.
>>>>>>> 
>>>>>>> Jesse
>>>>>>> 
>>>>>>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John 
>>>>>>> <johnthesaintjohn at gmail.com
>>>>>>> <mailto:johnthesaintjohn at gmail.com>> wrote:
>>>>>>> 
>>>>>>> Hello, I am having issues telling qsub to limit the number
>>>>>>> of jobs ran at any one time on each node of the cluster.
>>>>>>> There are sometimes ways to do this with things like "qsub
>>>>>>> -l node=1:ppn=1" or "qsub -l procs=2" or something. I even
>>>>>>> tried "qsub -l slots=2" but that gave me an error and told
>>>>>>> me to use the parallel environment. When I tried to use the
>>>>>>> "orte" parallel environment like "-pe orte 2" I see
>>>>>>> "slots=2" in my qstat list, but everything gets executed
>>>>>>> on one node at the same parallelization as before. How do I
>>>>>>> limit the number of jobs per node? I am running a process
>>>>>>> that consumes a very large amount of ram.
>>>>>>> 
>>>>>>> Thanks, John
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________ StarCluster
>>>>>>> mailing list StarCluster at mit.edu
>>>>>>> <mailto:StarCluster at mit.edu> 
>>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________ StarCluster
>>>>>> mailing list StarCluster at mit.edu 
>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>> 
>>>>> 
>>>>> -- Gavin W. Burris Senior Systems Programmer Information
>>>>> Security and Unix Systems School of Arts and Sciences 
>>>>> University of Pennsylvania
>>>> 
>>>> 
>>> 
>> 
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v2.0.19 (GNU/Linux)
>> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
>> 
>> iEYEARECAAYFAlCAKJAACgkQ4llAkMfDcrkzmgCgkXOBPBXw5Q41RF+qABuPH2NH
>> seQAoIqVmbTjgIrsPfFIJpj7POwbxcKf
>> =wRr3
>> -----END PGP SIGNATURE-----
>