<div dir="ltr"><div>You can set the Grid Engine &quot;queue_sort_method&quot; parameter to &quot;seq_no&quot; in sched_conf:<br><br><a href="http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html">http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html</a><br>

<br></div>And for this to work, we need each instance to have a different &quot;seq_no&quot;, so a small StarCluster plugin will need to be developed -- ie. the plugin will assign a new seq_no when an instance gets created.<br>

<div><div><div><div><div class="gmail_extra"><br clear="all"><div>Rayson<br><br>==================================================<br>Open Grid Scheduler - The Official Open Source Grid Engine<br><a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>

<a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a></div>

<br><br><div class="gmail_quote">On Thu, May 29, 2014 at 3:10 AM, David Mrva <span dir="ltr">&lt;<a href="mailto:davidm@cantabresearch.com" target="_blank">davidm@cantabresearch.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

Hello,<br>

<br>

I stared using StarCluster with Amazon spot instances. I expect that the<br>

workload of my application will fluctuate a lot and I aim to minimise<br>

the cost of running the spot instances. StarCluster&#39;s loadbalancer seems<br>

to go some way in this direction. It adds more spot instances when the<br>

SGE queue is busy and removes unused nodes. The removal of the nodes<br>

interacts with SGE&#39;s strategy for assigning jobs to queues. SGE chooses<br>

the node with the lowest load average to assign a job to. If there are<br>

more nodes in the cluster than are necessary to execute the jobs, this<br>

strategy will result in spreading the jobs that need to be executed<br>

across as many nodes as possible. This behaviour reduces the chances of<br>

some of the nodes staying unused and potentially being removed by the<br>

load balancer.<br>

<br>

I&#39;d like to configure StarCluster in such a way that SGE jobs go to node<br>

A for as long as there are slots available on it and they go to node B<br>

only if there is no vacant slot on node A. For example, on a cluster<br>

with nodes A and B and 8 slots on each node if there are 4 slots being<br>

used on node A and 4 more jobs arrive to SGE, I&#39;d like all 4 of these<br>

new jobs to go node A. Using the &quot;orte&quot; parallel environment with<br>

&quot;fill_up&quot; allocation strategy does not achieve this. For the above<br>

example, using the &quot;fill_up&quot; allocation strategy will pick node B<br>

(lowest load average node) and assign all 4 new jobs to it, resulting in<br>

nodes A and B running 4 jobs each instead of A running 8 jobs and B none.<br>

<br>

How can I use StarCluster&#39;s built-in load balancer to minimise the cost<br>

of running spot instances by minimising the number unused CPUs in the<br>

way described above?<br>

<br>

Many thanks,<br>

David<br>

_______________________________________________<br>

StarCluster mailing list<br>

<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>

<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

</blockquote></div><br></div></div></div></div></div></div>