[StarCluster] loadbalance

Rajat Banerjee rajatb at post.harvard.edu
Fri Sep 20 11:28:10 EDT 2013


Hi Ryan,
Sorry wrong qacct command. I think i may know what's happening. Are your
jobs really long running? I think the 'lookback window' for checking the
job history may be too short for you? You could try it with being at least
twice the duration of one of your qsub'd tasks. See how every other line
says ">>> Loading full job history" That comes up because jobstats are
empty, 'qacct -j -b <some time' is coming back empty.

Trying to reproduce the behavior from:
https://github.com/jtriley/StarCluster/blob/develop/starcluster/balancers/sge/__init__.py#L504

Could you send the output from this:
make a date of when you started your cluster, approximately, in this
format:

MMDDhhmm  Months, Days, hours, minutes

qacct -j -b <put that date format>

And please paste that qacct output here. That should always have a history
of all jobs. Then try the same with the date format being only 3 hours ago.
You can try toying with the lookback windows. The default is 3 hours and
you can feed a new one in on the command line:

*Lookback window* (-l LOOKBACK_WIN, –lookback_window=LOOKBACK_WIN) - How
long, in hours, to look back for past job history

Justin Riley, can you please update the doc on this site?
http://star.mit.edu/cluster/docs/0.93.3/manual/load_balancer.html

It says the window is in minutes but it's in fact in hours.

Thanks,
Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20130920/726070ec/attachment-0001.htm


More information about the StarCluster mailing list