[StarCluster] SGE slow with 50k+ jobs

Sun Mar 30 07:35:40 EDT 2014

The #1 suggestion is to change the default scheduling setup over to the
settings for "on demand scheduling" - you can do that live on a running
cluster I believe. Google can help you with the parameters but it
basically involves switching from the looping/cyclical scheduler
behavior to an event based method where SGE runs a scheduling/dispatch
cycle each time a job enters or leaves the system. This is purpose built
for the "many short jobs" use case.

The #2 suggestion is to make sure you use berkeleyDB based spooling
(this is one scenario where I'll switch from my preference for classic
spooling) AND make sure that you are spooling to local disk on each node
instead of spooling back to a shared filesystem.  These settings cannot
easily be changed on a live system so in starcluster land you might have
to tweak the source code to come up with a different installation
template that would put these settings in. (Note that I'm not sure what
SC puts in by default so there is a chance that this is already set up
the way you'd want ...)

Other stuff:

- SGE Array Jobs are way more efficient than single jobs. Instead of
70,000 jobs can you submit a single array job that has *70,000 tasks*
instead?  Or 7 jobs containing 10,000 tasks etc. ?

- Is there any way to batch up or collect your short jobs into bigger
collections so that the average job run time is longer? SGE is not the
best at jobs that run for mere seconds or less - most people would batch
those together to at least get a 60-90 second runtime ...

-Chris

Jacob Barhak wrote:
> Hi to SGE experts,
>
> This is less of a StarCluster specific issue. It is more of an SGE issue I encountered and was hoping someone here can help with. 
>
> My system runs many many smaller jobs - tens of thousands. When I need a rushed solution to reach a deadline I use StarCluster. However, if I have time, I run the simulations on a single 8 core machine that has SGE installed over Ubuntu 12.04. This machine is new and fast with SSD drive and freshly installed yet I am encountering issues. 
>
> 1. When I launch about 70k jobs submitting a single new job to the queue takes a while - about a second or so, compared to fractions of a second when the queue is empty. 
>
> 2. Deleting all the jobs from the queue using qdel -u username takes a long time. It reports about 24 deletes to the screen every few seconds - at this rate it will take hours to delete the entire queue. It is still deleting while I am writing these words. Way too much time. 
>
> 3. The system was working ok for a few days yet now has trouble with qmaster. It report the following:
> error: commlib error: got select error (connection refused) 
> unable to send message to qmaster using port 6444 on host 'localhost': got send error. 
> Also qmon reported cannot reach qmaster. I had to restart and suspend and disable the queue. 
>
> Note that qstat -j currently reports:
> All queues dropped because of overload or full
>
> Note that I configured the schedule interval to 2 seconds since many of my jobs are so fast that even 2 seconds is very inefficient for them yet some are longer and memory consuming so I cannot allow more slots to launch too many jobs. 
>
> Am I overloading the system with too many jobs? What is the limit on a single strong machine? How will this scale when I run this on StarCluster?
>
> Any advice on how to efficiently handle many jobs, some of which are very short, will be appreciated. And I hope this interests the audience. 
>
>         Jacob
>
> Sent from my iPhone
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster