[StarCluster] Starcluster SGE usage

Thu Oct 18 13:07:52 EDT 2012

Whoops, what I meant to say is that I would like to hammer something out that gets the job done. I am on IRC now in the place you suggested ( I think, never used IRC before ).

On Oct 18, 2012, at 9:04 AM, Justin Riley <jtriley at MIT.EDU> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hey Guys,
> 
> Glad you figured out what needed to be changed in the SGE
> configuration. I've been meaning to add a bunch more options to the
> SGE Plugin to configure things like this along with other SGE tuning
> parameters for some time now but simply haven't had the time. If
> either of you are interested in working on a PR to do this that'd be
> awesome. All of the SGE magic is here:
> 
> https://github.com/jtriley/StarCluster/blob/develop/starcluster/plugins/sge.py
> 
> and here's the SGE install and parallel environment templates used by
> StarCluster:
> 
> https://github.com/jtriley/StarCluster/blob/develop/starcluster/templates/sge.py
> 
> I'm happy to discuss the plugin and some of the changes that would be
> needed on IRC (freenode: #starcluster).
> 
> ~Justin
> 
> 
> On 10/18/2012 08:30 AM, Gavin W. Burris wrote:
>> Hi John,
>> 
>> You got it.  Keeping all on the same node requires $pe_slots.  This
>> is the same setting you would use for something like OpenMP.
>> 
>> As for configuring the queue automatically, maybe there is an
>> option in the SGE plugin that we can place in the
>> ~/.starcluster/config file?  I'd like to know, too.  If not, we
>> could maybe add some code.  Or keep a shell script on a persistent
>> volume that we run that does the needed qconf foo commands after
>> starting a new head node.
>> 
>> Cheers.
>> 
>> 
>> On 10/17/2012 05:23 PM, John St. John wrote:
>>> Hi Gavin, Thanks for pointing me in the right direction. I found
>>> a great solution though that seems to work really well. Since the
>>> "slots" is already set up to be equal to the core count on each
>>> node, I just needed access to a parallel environment that allowed
>>> me to submit jobs to nodes, but request a certain number of slots
>>> on a single node rather than spread out across N nodes. Changing
>>> the allocation rule to "fill" would probably still overflow into
>>> multiple nodes at the edge case. The way to do this properly is
>>> with the $pe_slots allocation rule in the parallel environment
>>> config file. Here is what I did:
>>> 
>>> qconf -sp by_node (create this with qconf -ap [name])
>>> 
>>> pe_name            by_node slots              9999999 user_lists
>>> NONE xuser_lists        NONE start_proc_args    /bin/true 
>>> stop_proc_args     /bin/true allocation_rule    $pe_slots 
>>> control_slaves     TRUE job_is_first_task  TRUE urgency_slots
>>> min accounting_summary FALSE
>>> 
>>> 
>>> Then I modify the parallel environment list in all.q: qconf -mq
>>> all.q pe_list               make orte by_node
>>> 
>>> That does it! Wahoo!
>>> 
>>> Ok now the problem is that I want this done automatically
>>> whenever a cluster is booted up, and if a node is added I want to
>>> make sure these configurations aren't clobbered. Any suggestions
>>> on making that happen?
>>> 
>>> Thanks everyone for your time!
>>> 
>>> Best, John
>>> 
>>> 
>>> On Oct 17, 2012, at 8:16 AM, Gavin W. Burris <bug at sas.upenn.edu>
>>> wrote:
>>> 
>>>> Hi John,
>>>> 
>>>> The default configuration will distribute jobs based on load,
>>>> meaning new jobs land on the least loaded node.  If you want to
>>>> fill nodes, you can change the load formula on the scheduler
>>>> config: # qconf -msconf load_formula    slots
>>>> 
>>>> If you are using a parallel environment, the default can be
>>>> changed to fill a node, as well: # qconf -mp orte 
>>>> allocation_rule    $fill_up
>>>> 
>>>> You may want to consider making memory consumable to prevent 
>>>> over-subscription.  An easy option may be to make an arbitrary 
>>>> consumable complex resource, say john_jobs, and set it to the
>>>> max number you want running at one time: # qconf -mc john_jobs
>>>> jj INT <= YES YES 0 0 # qconf -me global complex_values
>>>> john_jobs=10
>>>> 
>>>> Then, when you submit a job, specify the resource: $ qsub -l
>>>> jj=1 ajob.sh
>>>> 
>>>> Each job submitted in this way will consume one count of
>>>> john_jobs, effectively limiting you to ten.
>>>> 
>>>> Cheers.
>>>> 
>>>> 
>>>> On 10/16/2012 06:32 PM, John St. John wrote:
>>>>> Thanks Jesse!
>>>>> 
>>>>> This does seem to work. I don't need to define -pe in this
>>>>> case b/c the slots are actually limited per node.
>>>>> 
>>>>> My only problem with this solution is that all jobs are now
>>>>> limited to this hard coded number of slots, and also when
>>>>> nodes are added to the cluster while it is running the file
>>>>> is modified and the line would need to be edited again. On
>>>>> other systems I have seen the ability to specify that a job
>>>>> will use a specific number of CPU's without being in a 
>>>>> special parallel environment I have seen the "-l ncpus=X"
>>>>> option working, but it does't seem to with the default
>>>>> starcluster setup. Also it looks like the "orte" parallel
>>>>> environment has some stuff very specific to MPI, and doesn't
>>>>> have a problem splitting the requested number of slots
>>>>> between multiple nodes, which I definitely don't want. I just
>>>>> want to limit the number of jobs per node, but be able to
>>>>> specify that at runtime.
>>>>> 
>>>>> It looks like the grid engine is somehow aware of the number
>>>>> of CPU's available on each node. I get this with by running
>>>>> `qhost`: HOSTNAME                ARCH         NCPU  LOAD
>>>>> MEMTOT  MEMUSE  SWAPTO SWAPUS 
>>>>> -------------------------------------------------------------------------------
>>>>> 
>>>>> 
> global                  -               -     -       -       -       -
>>>>> - master                  linux-x64       8  0.88   67.1G
>>>>> 1.5G     0.0 0.0 node001                 linux-x64       8
>>>>> 0.36   67.1G  917.3M     0.0 0.0 node002
>>>>> linux-x64       8  0.04   67.1G  920.4M     0.0 0.0 node003
>>>>> linux-x64       8  0.04   67.1G  887.3M     0.0 0.0 node004
>>>>> linux-x64       8  0.06   67.1G  911.4M     0.0 0.0
>>>>> 
>>>>> 
>>>>> So it seems like there should be a way to tell qsub that job
>>>>> X is using some subset of the available CPU, or RAM, so that
>>>>> it doesn't oversubscribe the node.
>>>>> 
>>>>> Thanks for your time!
>>>>> 
>>>>> Best, John
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Oct 16, 2012, at 2:12 PM, Jesse Lu <jesselu at stanford.edu 
>>>>> <mailto:jesselu at stanford.edu>> wrote:
>>>>> 
>>>>>> You can modify the all.q queue to assign a fixed number of
>>>>>> slots to each node.
>>>>>> 
>>>>>> * If I remember correctly, "$ qconf -mq all.q" will bring
>>>>>> up the configuration of the all.q queue in an editor. *
>>>>>> Under the "slots" attribute should be a semilengthly string
>>>>>> such as "[node001=16],[node002=16],..." * Try replacing the
>>>>>> entire string with a single number such as "2". This should
>>>>>> assign each host to have only two slots. * Save the
>>>>>> configuration and try a simple submission with the 'orte' 
>>>>>> parallel environment and let me know if it works.
>>>>>> 
>>>>>> Jesse
>>>>>> 
>>>>>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John 
>>>>>> <johnthesaintjohn at gmail.com
>>>>>> <mailto:johnthesaintjohn at gmail.com>> wrote:
>>>>>> 
>>>>>> Hello, I am having issues telling qsub to limit the number
>>>>>> of jobs ran at any one time on each node of the cluster.
>>>>>> There are sometimes ways to do this with things like "qsub
>>>>>> -l node=1:ppn=1" or "qsub -l procs=2" or something. I even
>>>>>> tried "qsub -l slots=2" but that gave me an error and told
>>>>>> me to use the parallel environment. When I tried to use the
>>>>>> "orte" parallel environment like "-pe orte 2" I see
>>>>>> "slots=2" in my qstat list, but everything gets executed
>>>>>> on one node at the same parallelization as before. How do I
>>>>>> limit the number of jobs per node? I am running a process
>>>>>> that consumes a very large amount of ram.
>>>>>> 
>>>>>> Thanks, John
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________ StarCluster
>>>>>> mailing list StarCluster at mit.edu
>>>>>> <mailto:StarCluster at mit.edu> 
>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________ StarCluster
>>>>> mailing list StarCluster at mit.edu 
>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>> 
>>>> 
>>>> -- Gavin W. Burris Senior Systems Programmer Information
>>>> Security and Unix Systems School of Arts and Sciences 
>>>> University of Pennsylvania
>>> 
>>> 
>> 
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.19 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
> 
> iEYEARECAAYFAlCAKJAACgkQ4llAkMfDcrkzmgCgkXOBPBXw5Q41RF+qABuPH2NH
> seQAoIqVmbTjgIrsPfFIJpj7POwbxcKf
> =wRr3
> -----END PGP SIGNATURE-----