[StarCluster] minimal cost with loadbalance

David Mrva davidm at cantabresearch.com
Thu May 29 03:10:46 EDT 2014


Hello,

I stared using StarCluster with Amazon spot instances. I expect that the 
workload of my application will fluctuate a lot and I aim to minimise 
the cost of running the spot instances. StarCluster's loadbalancer seems 
to go some way in this direction. It adds more spot instances when the 
SGE queue is busy and removes unused nodes. The removal of the nodes 
interacts with SGE's strategy for assigning jobs to queues. SGE chooses 
the node with the lowest load average to assign a job to. If there are 
more nodes in the cluster than are necessary to execute the jobs, this 
strategy will result in spreading the jobs that need to be executed 
across as many nodes as possible. This behaviour reduces the chances of 
some of the nodes staying unused and potentially being removed by the 
load balancer.

I'd like to configure StarCluster in such a way that SGE jobs go to node 
A for as long as there are slots available on it and they go to node B 
only if there is no vacant slot on node A. For example, on a cluster 
with nodes A and B and 8 slots on each node if there are 4 slots being 
used on node A and 4 more jobs arrive to SGE, I'd like all 4 of these 
new jobs to go node A. Using the "orte" parallel environment with 
"fill_up" allocation strategy does not achieve this. For the above 
example, using the "fill_up" allocation strategy will pick node B 
(lowest load average node) and assign all 4 new jobs to it, resulting in 
nodes A and B running 4 jobs each instead of A running 8 jobs and B none.

How can I use StarCluster's built-in load balancer to minimise the cost 
of running spot instances by minimising the number unused CPUs in the 
way described above?

Many thanks,
David


More information about the StarCluster mailing list