[StarCluster] Is StarCluster still under active development?

Tony Robinson tonyr at speechmatics.com
Fri Mar 25 15:56:23 EDT 2016


Hi Rajat,

The main issue that I have with the load balancer is sometimes bringing 
up a node or taking down a node fails and this caused the loadbalancer 
to fall over.   This is almost certainly an issue with boto - I just 
haven't looked into it enough.

I'm working on the loadbalancer right now.   I'm running a few different 
sorts of jobs, some take half a minute some take five minutes.   It 
takes me about five minutes to bring a node up, so load balancing is 
quite a hard task, certainly what's there at the moment isn't optimal.

In your masters thesis you had a go at anticipating the future load 
based on the queue, although I see no trace of this in the current 
code.   What seems like the most obvious approach to me is to look at 
what's running and in the queue and see if it's all going to complete 
within some specified period.   If it is, then fine, if not assume you 
are going to bring n nodes up (start at n=1) and then see if it'll 
complete, if not then increment n.

I've got a version of this running but it isn't completed because 
avg_job_duration() consistently under reports.   I'm doing some 
debugging, it seems that jobstats[] has a bug, I have three type of job, 
a start, middle and end, and as they are all run in sequence then 
jobstats[] should have equal numbers of each.   It doesn't.

This is a weekend (with unreliable time) activity for me.   If you or 
anyone else wants to help:

a) getting avg_job_duration() working  which probably means fixing 
jobstats[]
b) getting a clean simple predictive load balancer working

then please contact me.


Tony

On 25/03/16 17:17, Rajat Banerjee wrote:
> I'll fix any issues with the load balancer if they come up.


-- 
Speechmatics is a trading name of Cantab Research Limited
We are hiring: www.speechmatics.com/careers 
<https://www.speechmatics.com/careers>
Dr A J Robinson, Founder, Cantab Research Ltd
Phone direct: 01223 794096, office: 01223 794497
Company reg no GB 05697423, VAT reg no 925606030
51 Canterbury Street, Cambridge, CB4 3QG, UK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20160325/28ddad65/attachment-0001.html


More information about the StarCluster mailing list