<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 01/04/16 16:22, Rajat Banerjee
wrote:<br>
</div>
<blockquote
cite="mid:CAAEsPuc=aOO75vH=yVrXd0HGaP=PQsShPf6Lfg+FOR_05wjezQ@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>Regarding:<br>
How about we just call qacct every 5 mins, or if the
qacct buffer is empty. <br>
</div>
<div>calling qacct and getting the job stats is the
first part of the load balancers loop to see what
the cluster is up to. I prioritized knowing the
current state, and keeping the LB running it's loop
as fast as possible (2-10 seconds), so it could run
in a 1-minute loop and stay roughly on-schedule.
It's easy to run the whole LB loop with 5 minutes
between loops with the command line arg <span
class="">polling_interval, if that suits your
workload better. I do not mean to sound
dismissive, but the command line options (with
reasonable defaults)are there so you can test and
tweak to your work load.<br>
</span></div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Ah, I wasn't very clear. What I mean is that we only update the
qacct stats every 5 minutes. I run the main loop every 30s. <br>
<br>
But calling qacct doesn't' take any time - we could do it every
polling interval:<br>
<br>
root@master:~# date<br>
Fri Apr 1 16:54:31 BST 2016<br>
root@master:~# echo qacct -j -b `date +%y%m%d`$((`date +%H` -
3))`date +%m`<br>
qacct -j -b 1604011304<br>
root@master:~# time qacct -j -b `date +%y%m%d`$((`date +%H` -
3))`date +%m` | wc<br>
99506 224476 3307423<br>
<br>
real 0m0.588s<br>
user 0m0.560s<br>
sys 0m0.076s<br>
root@master:~# <br>
<br>
<br>
If calling qacct is slow then the update could be run at the end of
the loop so it would have all of the loop wait time to complete in.<br>
<br>
<blockquote
cite="mid:CAAEsPuc=aOO75vH=yVrXd0HGaP=PQsShPf6Lfg+FOR_05wjezQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>Regarding:<br>
</div>
Three sorts of jobs, all of which should occur in the
same numbers,<br>
</div>
Have you tried testing your call to qacct to see if it's
returning what you want? You could modify it in your
source if it's not representative of your jobs:<br>
<a moz-do-not-send="true"
href="https://github.com/jtriley/StarCluster/blob/develop/starcluster/balancers/sge/__init__.py#L528">https://github.com/jtriley/StarCluster/blob/develop/starcluster/balancers/sge/__init__.py#L528</a><br>
qacct_cmd <span class="">=</span> <span class=""><span
class="">'</span><tt>qacct -j -b </tt><span class="">'</span></span>
<span class="">+</span> qatime<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
Yes, thanks, I'm comparing to running qacct outside of the load
balancer.<br>
<br>
<blockquote
cite="mid:CAAEsPuc=aOO75vH=yVrXd0HGaP=PQsShPf6Lfg+FOR_05wjezQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>Obviously one size doesn't fit all here, but if you
find a set of args for qacct that work better for you, let
me know.<br>
</div>
</div>
</div>
</div>
</blockquote>
<br>
At the moment I don't think that the output of qacct is used at all
is it? I thought it was only used to give job stats, I don't think
it's really used to bring nodes up/down.<br>
<br>
<br>
Tony<br>
<br>
<div class="moz-signature">-- <br>
Speechmatics is a trading name of Cantab Research Limited<br>
We are hiring: <a href="http:www.speechmatics.com/careers">www.speechmatics.com/careers</a><br>
Dr A J Robinson, Founder, Cantab Research Ltd<br>
Phone direct: 01223 794096, office: 01223 794497<br>
Company reg no GB 05697423, VAT reg no 925606030<br>
51 Canterbury Street, Cambridge, CB4 3QG, UK<br>
</div>
</body>
</html>