[StarCluster] Is StarCluster still under active development?

Tony Robinson tonyr at speechmatics.com
Fri Apr 1 12:01:24 EDT 2016


On 01/04/16 16:22, Rajat Banerjee wrote:
> Regarding:
> How about we just call qacct every 5 mins, or if the qacct buffer is 
> empty.
> calling qacct and getting the job stats is the first part of the load 
> balancers loop to see what the cluster is up to. I prioritized knowing 
> the current state, and keeping the LB running it's loop as fast as 
> possible (2-10 seconds), so it could run in a 1-minute loop and stay 
> roughly on-schedule. It's easy to run the whole LB loop with 5 minutes 
> between loops with the command line arg polling_interval, if that 
> suits your workload better. I do not mean to sound dismissive, but the 
> command line options (with reasonable defaults)are there so you can 
> test and tweak to your work load.

Ah, I wasn't very clear.   What I mean is that we only update the qacct 
stats every 5 minutes.   I run the main loop every 30s.

But calling qacct doesn't' take any time - we could do it every polling 
interval:

root at master:~# date
Fri Apr  1 16:54:31 BST 2016
root at master:~# echo qacct -j -b `date +%y%m%d`$((`date +%H` - 3))`date +%m`
qacct -j -b 1604011304
root at master:~# time  qacct -j -b `date +%y%m%d`$((`date +%H` - 3))`date 
+%m` | wc
   99506  224476 3307423

real    0m0.588s
user    0m0.560s
sys    0m0.076s
root at master:~#


If calling qacct is slow then the update could be run at the end of the 
loop so it would have all of the loop wait time to complete in.

> Regarding:
> Three sorts of jobs, all of which should occur in the same numbers,
> Have you tried testing your call to qacct to see if it's returning 
> what you want? You could modify it in your source if it's not 
> representative of your jobs:
> https://github.com/jtriley/StarCluster/blob/develop/starcluster/balancers/sge/__init__.py#L528
> qacct_cmd = 'qacct -j -b ' + qatime

Yes, thanks, I'm comparing to running qacct outside of the load balancer.

> Obviously one size doesn't fit all here, but if you find a set of args 
> for qacct that work better for you, let me know.

At the moment I don't think that the output of qacct is used at all is 
it?   I thought it was only used to give job stats, I don't think it's 
really used to bring nodes up/down.


Tony

-- 
Speechmatics is a trading name of Cantab Research Limited
We are hiring: www.speechmatics.com/careers 
<http:www.speechmatics.com/careers>
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20160401/ba3883ac/attachment.html


More information about the StarCluster mailing list