[StarCluster] Starcluster SGE usage

Justin Riley jtriley at MIT.EDU
Thu Oct 18 13:59:46 EDT 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi John,

Wow that was fast, thanks! I'll take a look tonight and give you some
feedback. Also I don't see you on #starcluster on IRC. Are you sure
you're connected to freenode? You can join via your browser if you like:

http://webchat.freenode.net/?channels=starcluster

Unfortunately I wont be online much for the rest of the day but please
feel free to join often/hang out until we get this merged.

~Justin

On 10/18/2012 01:55 PM, John St. John wrote:
> Ok just submitted a pull request. I modified the sge template
> "sge_pe_template" so that you can modify the allocation type in a
> new parallel environment. I modified the SGE plugin so that it
> makes by_node as well as ORTE, and by_node uses the $pe_slots
> allocation type. I have tested this out with creating a cluster
> (haven't tried adding/deleting nodes) and it seems to work. The
> changes are pretty minimal to get here so I feel pretty confident
> that I didn't add any new bugs.
> 
> Best, John
> 
> 
> On Oct 18, 2012, at 10:07 AM, John St. John
> <johnthesaintjohn at gmail.com> wrote:
> 
>> Whoops, what I meant to say is that I would like to hammer
>> something out that gets the job done. I am on IRC now in the
>> place you suggested ( I think, never used IRC before ).
>> 
>> On Oct 18, 2012, at 9:04 AM, Justin Riley <jtriley at MIT.EDU>
>> wrote:
>> 
> Hey Guys,
> 
> Glad you figured out what needed to be changed in the SGE 
> configuration. I've been meaning to add a bunch more options to
> the SGE Plugin to configure things like this along with other SGE
> tuning parameters for some time now but simply haven't had the
> time. If either of you are interested in working on a PR to do this
> that'd be awesome. All of the SGE magic is here:
> 
> https://github.com/jtriley/StarCluster/blob/develop/starcluster/plugins/sge.py
>
>  and here's the SGE install and parallel environment templates used
> by StarCluster:
> 
> https://github.com/jtriley/StarCluster/blob/develop/starcluster/templates/sge.py
>
>  I'm happy to discuss the plugin and some of the changes that would
> be needed on IRC (freenode: #starcluster).
> 
> ~Justin
> 
> 
> On 10/18/2012 08:30 AM, Gavin W. Burris wrote:
>>>>> Hi John,
>>>>> 
>>>>> You got it.  Keeping all on the same node requires
>>>>> $pe_slots.  This is the same setting you would use for
>>>>> something like OpenMP.
>>>>> 
>>>>> As for configuring the queue automatically, maybe there is
>>>>> an option in the SGE plugin that we can place in the 
>>>>> ~/.starcluster/config file?  I'd like to know, too.  If
>>>>> not, we could maybe add some code.  Or keep a shell script
>>>>> on a persistent volume that we run that does the needed
>>>>> qconf foo commands after starting a new head node.
>>>>> 
>>>>> Cheers.
>>>>> 
>>>>> 
>>>>> On 10/17/2012 05:23 PM, John St. John wrote:
>>>>>> Hi Gavin, Thanks for pointing me in the right direction.
>>>>>> I found a great solution though that seems to work really
>>>>>> well. Since the "slots" is already set up to be equal to
>>>>>> the core count on each node, I just needed access to a
>>>>>> parallel environment that allowed me to submit jobs to
>>>>>> nodes, but request a certain number of slots on a single
>>>>>> node rather than spread out across N nodes. Changing the
>>>>>> allocation rule to "fill" would probably still overflow
>>>>>> into multiple nodes at the edge case. The way to do this
>>>>>> properly is with the $pe_slots allocation rule in the
>>>>>> parallel environment config file. Here is what I did:
>>>>>> 
>>>>>> qconf -sp by_node (create this with qconf -ap [name])
>>>>>> 
>>>>>> pe_name            by_node slots              9999999
>>>>>> user_lists NONE xuser_lists        NONE start_proc_args
>>>>>> /bin/true stop_proc_args     /bin/true allocation_rule
>>>>>> $pe_slots control_slaves     TRUE job_is_first_task  TRUE
>>>>>> urgency_slots min accounting_summary FALSE
>>>>>> 
>>>>>> 
>>>>>> Then I modify the parallel environment list in all.q:
>>>>>> qconf -mq all.q pe_list               make orte by_node
>>>>>> 
>>>>>> That does it! Wahoo!
>>>>>> 
>>>>>> Ok now the problem is that I want this done
>>>>>> automatically whenever a cluster is booted up, and if a
>>>>>> node is added I want to make sure these configurations
>>>>>> aren't clobbered. Any suggestions on making that happen?
>>>>>> 
>>>>>> Thanks everyone for your time!
>>>>>> 
>>>>>> Best, John
>>>>>> 
>>>>>> 
>>>>>> On Oct 17, 2012, at 8:16 AM, Gavin W. Burris
>>>>>> <bug at sas.upenn.edu> wrote:
>>>>>> 
>>>>>>> Hi John,
>>>>>>> 
>>>>>>> The default configuration will distribute jobs based on
>>>>>>> load, meaning new jobs land on the least loaded node.
>>>>>>> If you want to fill nodes, you can change the load
>>>>>>> formula on the scheduler config: # qconf -msconf
>>>>>>> load_formula    slots
>>>>>>> 
>>>>>>> If you are using a parallel environment, the default
>>>>>>> can be changed to fill a node, as well: # qconf -mp
>>>>>>> orte allocation_rule    $fill_up
>>>>>>> 
>>>>>>> You may want to consider making memory consumable to
>>>>>>> prevent over-subscription.  An easy option may be to
>>>>>>> make an arbitrary consumable complex resource, say
>>>>>>> john_jobs, and set it to the max number you want
>>>>>>> running at one time: # qconf -mc john_jobs jj INT <=
>>>>>>> YES YES 0 0 # qconf -me global complex_values 
>>>>>>> john_jobs=10
>>>>>>> 
>>>>>>> Then, when you submit a job, specify the resource: $
>>>>>>> qsub -l jj=1 ajob.sh
>>>>>>> 
>>>>>>> Each job submitted in this way will consume one count
>>>>>>> of john_jobs, effectively limiting you to ten.
>>>>>>> 
>>>>>>> Cheers.
>>>>>>> 
>>>>>>> 
>>>>>>> On 10/16/2012 06:32 PM, John St. John wrote:
>>>>>>>> Thanks Jesse!
>>>>>>>> 
>>>>>>>> This does seem to work. I don't need to define -pe in
>>>>>>>> this case b/c the slots are actually limited per
>>>>>>>> node.
>>>>>>>> 
>>>>>>>> My only problem with this solution is that all jobs
>>>>>>>> are now limited to this hard coded number of slots,
>>>>>>>> and also when nodes are added to the cluster while it
>>>>>>>> is running the file is modified and the line would
>>>>>>>> need to be edited again. On other systems I have seen
>>>>>>>> the ability to specify that a job will use a specific
>>>>>>>> number of CPU's without being in a special parallel
>>>>>>>> environment I have seen the "-l ncpus=X" option
>>>>>>>> working, but it does't seem to with the default 
>>>>>>>> starcluster setup. Also it looks like the "orte"
>>>>>>>> parallel environment has some stuff very specific to
>>>>>>>> MPI, and doesn't have a problem splitting the
>>>>>>>> requested number of slots between multiple nodes,
>>>>>>>> which I definitely don't want. I just want to limit
>>>>>>>> the number of jobs per node, but be able to specify
>>>>>>>> that at runtime.
>>>>>>>> 
>>>>>>>> It looks like the grid engine is somehow aware of the
>>>>>>>> number of CPU's available on each node. I get this
>>>>>>>> with by running `qhost`: HOSTNAME                ARCH
>>>>>>>> NCPU  LOAD MEMTOT  MEMUSE  SWAPTO SWAPUS 
>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>
>>>>>>>> 
global                  -               -     -       -       -       -
>>>>>>>> - master                  linux-x64       8  0.88
>>>>>>>> 67.1G 1.5G     0.0 0.0 node001
>>>>>>>> linux-x64       8 0.36   67.1G  917.3M     0.0 0.0
>>>>>>>> node002 linux-x64       8  0.04   67.1G  920.4M
>>>>>>>> 0.0 0.0 node003 linux-x64       8  0.04   67.1G
>>>>>>>> 887.3M     0.0 0.0 node004 linux-x64       8  0.06
>>>>>>>> 67.1G  911.4M     0.0 0.0
>>>>>>>> 
>>>>>>>> 
>>>>>>>> So it seems like there should be a way to tell qsub
>>>>>>>> that job X is using some subset of the available CPU,
>>>>>>>> or RAM, so that it doesn't oversubscribe the node.
>>>>>>>> 
>>>>>>>> Thanks for your time!
>>>>>>>> 
>>>>>>>> Best, John
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Oct 16, 2012, at 2:12 PM, Jesse Lu
>>>>>>>> <jesselu at stanford.edu <mailto:jesselu at stanford.edu>>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> You can modify the all.q queue to assign a fixed
>>>>>>>>> number of slots to each node.
>>>>>>>>> 
>>>>>>>>> * If I remember correctly, "$ qconf -mq all.q" will
>>>>>>>>> bring up the configuration of the all.q queue in an
>>>>>>>>> editor. * Under the "slots" attribute should be a
>>>>>>>>> semilengthly string such as
>>>>>>>>> "[node001=16],[node002=16],..." * Try replacing
>>>>>>>>> the entire string with a single number such as "2".
>>>>>>>>> This should assign each host to have only two
>>>>>>>>> slots. * Save the configuration and try a simple
>>>>>>>>> submission with the 'orte' parallel environment and
>>>>>>>>> let me know if it works.
>>>>>>>>> 
>>>>>>>>> Jesse
>>>>>>>>> 
>>>>>>>>> On Tue, Oct 16, 2012 at 1:37 PM, John St. John 
>>>>>>>>> <johnthesaintjohn at gmail.com 
>>>>>>>>> <mailto:johnthesaintjohn at gmail.com>> wrote:
>>>>>>>>> 
>>>>>>>>> Hello, I am having issues telling qsub to limit the
>>>>>>>>> number of jobs ran at any one time on each node of
>>>>>>>>> the cluster. There are sometimes ways to do this
>>>>>>>>> with things like "qsub -l node=1:ppn=1" or "qsub -l
>>>>>>>>> procs=2" or something. I even tried "qsub -l
>>>>>>>>> slots=2" but that gave me an error and told me to
>>>>>>>>> use the parallel environment. When I tried to use
>>>>>>>>> the "orte" parallel environment like "-pe orte 2" I
>>>>>>>>> see "slots=2" in my qstat list, but everything gets
>>>>>>>>> executed on one node at the same parallelization as
>>>>>>>>> before. How do I limit the number of jobs per node?
>>>>>>>>> I am running a process that consumes a very large
>>>>>>>>> amount of ram.
>>>>>>>>> 
>>>>>>>>> Thanks, John
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> StarCluster mailing list StarCluster at mit.edu 
>>>>>>>>> <mailto:StarCluster at mit.edu> 
>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> 
_______________________________________________ StarCluster
>>>>>>>> mailing list StarCluster at mit.edu 
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>>>> 
>>>>>>> 
>>>>>>> -- Gavin W. Burris Senior Systems Programmer
>>>>>>> Information Security and Unix Systems School of Arts
>>>>>>> and Sciences University of Pennsylvania
>>>>>> 
>>>>>> 
>>>>> 
> 
>> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlCAQ5IACgkQ4llAkMfDcrmGLwCePiJF14n7wRBMe0B9TLN0hWY3
TcEAnRd3k/gKorVaynUDNM9uIZNMoGnt
=bsDl
-----END PGP SIGNATURE-----


More information about the StarCluster mailing list