[StarCluster] minimal cost with loadbalance
David Mrva
davidm at cantabresearch.com
Thu May 29 03:10:46 EDT 2014
Hello,
I stared using StarCluster with Amazon spot instances. I expect that the
workload of my application will fluctuate a lot and I aim to minimise
the cost of running the spot instances. StarCluster's loadbalancer seems
to go some way in this direction. It adds more spot instances when the
SGE queue is busy and removes unused nodes. The removal of the nodes
interacts with SGE's strategy for assigning jobs to queues. SGE chooses
the node with the lowest load average to assign a job to. If there are
more nodes in the cluster than are necessary to execute the jobs, this
strategy will result in spreading the jobs that need to be executed
across as many nodes as possible. This behaviour reduces the chances of
some of the nodes staying unused and potentially being removed by the
load balancer.
I'd like to configure StarCluster in such a way that SGE jobs go to node
A for as long as there are slots available on it and they go to node B
only if there is no vacant slot on node A. For example, on a cluster
with nodes A and B and 8 slots on each node if there are 4 slots being
used on node A and 4 more jobs arrive to SGE, I'd like all 4 of these
new jobs to go node A. Using the "orte" parallel environment with
"fill_up" allocation strategy does not achieve this. For the above
example, using the "fill_up" allocation strategy will pick node B
(lowest load average node) and assign all 4 new jobs to it, resulting in
nodes A and B running 4 jobs each instead of A running 8 jobs and B none.
How can I use StarCluster's built-in load balancer to minimise the cost
of running spot instances by minimising the number unused CPUs in the
way described above?
Many thanks,
David
More information about the StarCluster
mailing list