[StarCluster] Delay when using Sun Grid Engine

Justin Riley jtriley at MIT.EDU
Wed Oct 17 14:00:49 EDT 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jesse/Rayson,

Sorry for my absence on this. The latest version of OGS is included in
the up and coming 12.04 AMIs. I'm finishing up some testing of the
12.04 AMIs and will release them soon. I'm happy to say that
ge2011.11u1p1 works great.

Also it's useful to know about the load_report_time variable given
that I've also experienced the exact delay in terms of reporting a PE
job as finished. I'll likely tweak this in the default StarCluster SGE
setup.

~Justin

On 09/05/2012 07:32 PM, Jesse Lu wrote:
> Hi Rayson,
> 
> Let me first say thanks for OGS, its a super useful tool!
> 
> So, an update.... I realized that the parameter was
> load_report_time in the global configuration. The delay was
> basically exactly load_report_time, and so I have set it to 0, and
> the delay is basically gone...
> 
> Rayson, here is my global configuration (qconf -sconf), any
> comments? Particularly, is it okay to have a value of zero for
> load_report_time?
> 
> $ qconf -sconf #global: execd_spool_dir
> /opt/sge6/default/spool mailer                       /bin/mail 
> xterm                        /usr/bin/X11/xterm load_sensor
> none prolog                       none epilog
> none shell_start_mode             posix_compliant login_shells
> sh,bash,ksh,csh,tcsh min_uid                      0 min_gid
> 0 user_lists                   none xuser_lists
> none projects                     none xprojects
> none enforce_project              false enforce_user
> auto load_report_time             00:00:00 max_unheard
> 00:05:00 reschedule_unknown           02:00:00 loglevel
> log_warning administrator_mail           none at none.edu
> <mailto:none at none.edu> set_token_cmd                none pag_cmd
> none token_extend_time            none shepherd_cmd
> none qmaster_params               none execd_params
> none reporting_params             accounting=false reporting=false
> \ flush_time=00:00:15 joblog=false sharelog=00:00:00 finished_jobs
> 100 gid_range                    20000-20100 qlogin_command
> builtin qlogin_daemon                builtin rlogin_command
> builtin rlogin_daemon                builtin rsh_command
> builtin rsh_daemon                   builtin max_aj_instances
> 2000 max_aj_tasks                 75000 max_u_jobs
> 0 max_jobs                     0 max_advance_reservations     0 
> auto_user_oticket            0 auto_user_fshare             0 
> auto_user_default_project    none auto_user_delete_time
> 86400 delegated_file_staging       false reprioritize
> 0 jsv_url                      none jsv_allowed_mod
> ac,h,i,e,o,j,M,N,p,w
> 
> 
> On Wed, Sep 5, 2012 at 12:52 PM, Rayson Ho <raysonlogin at gmail.com 
> <mailto:raysonlogin at gmail.com>> wrote:
> 
> On Wed, Sep 5, 2012 at 1:10 PM, Jesse Lu <jesselu at stanford.edu 
> <mailto:jesselu at stanford.edu>> wrote:
>> However, if I run in a parallel environment (e.g. qsub -pe orte
> ...) then
>> there is an approximately 40 sec delay after job completion.
>> That
> is to say,
>> the job has technically finished, although qstat still lists it
>> as
> running,
>> and subsequent jobs are held up. Any ideas?
> 
> That's fixed in the update release.
> 
> Rayson
> 
> ================================================== Open Grid
> Scheduler - The Official Open Source Grid Engine 
> http://gridscheduler.sourceforge.net/
> 
> 
>> 
>> Thanks in advance!
>> 
>> 
>> On Tue, Sep 4, 2012 at 5:33 PM, Rayson Ho <raysonlogin at gmail.com
> <mailto:raysonlogin at gmail.com>> wrote:
>>> 
>>> That's the default scheduling time, and if you really want the 
>>> scheduler to react to your qsub requests ASAP, you can turn on 
>>> "scheduling-on-demand":
>>> 
>>> http://gridscheduler.sourceforge.net/howto/tuning.html
>>> 
>>> And in OGS/GE 2011.11 u1 p1 (we need a better name), the time
>>> it
> takes
>>> to report job done should be reduced.
>>> 
>>> Rayson
>>> 
>>> ================================================== Open Grid
>>> Scheduler - The Official Open Source Grid Engine 
>>> http://gridscheduler.sourceforge.net/
>>> 
>>> 
>>> 
>>> On Tue, Sep 4, 2012 at 8:05 PM, Jesse Lu <mr.jesselu at gmail.com
> <mailto:mr.jesselu at gmail.com>> wrote:
>>>> Yes! Exactly.
>>>> 
>>>> -- Jesse ________________________________ On Sep 4, 2012 4:19
>>>> PM, Rayson Ho <raysonlogin at gmail.com
> <mailto:raysonlogin at gmail.com>> wrote:
>>>> 
>>>> Hi Jesse,
>>>> 
>>>> Are you referring to the scheduling time of Grid Engine??
>>>> 
>>>> Rayson
>>>> 
>>>> ================================================== Open Grid
>>>> Scheduler - The Official Open Source Grid Engine 
>>>> http://gridscheduler.sourceforge.net/
>>>> 
>>>> 
>>>> On Tue, Sep 4, 2012 at 6:37 PM, Jesse Lu
>>>> <jesselu at stanford.edu
> <mailto:jesselu at stanford.edu>> wrote:
>>>>> Hi StarCluster users,
>>>>> 
>>>>> I've noticed long delays with Sun Grid Engine when
>>>>> submitting
> jobs and
>>>>> especially after job execution. Even running a simple
> "hostname" job
>>>>> takes several seconds. Moreover, running an MPI version of
> "hostname" can
>>>>> take 2 minutes!!
>>>>> 
>>>>> Can someone help me get rid of this delay? Thank you.
>>>>> 
>>>>> Jesse
>>>>> 
>>>>> _______________________________________________ StarCluster
>>>>> mailing list StarCluster at mit.edu
>>>>> <mailto:StarCluster at mit.edu> 
>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>> 
>> 
>> 
> 
> 
> 
> 
> _______________________________________________ StarCluster mailing
> list StarCluster at mit.edu 
> http://mailman.mit.edu/mailman/listinfo/starcluster
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlB+8lEACgkQ4llAkMfDcrnyGwCeLtu7X6gljri93H2XHsQVI8HM
0Q4AnAq/tuq9H+2mENE2ZtgzqdlXxS1U
=bn9p
-----END PGP SIGNATURE-----


More information about the StarCluster mailing list