<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:"Courier New";}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">You can use StephansBlog method as well, maybe an easier plugin than seq#?:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><a href="http://wiki.gridengine.info/wiki/index.php/StephansBlog">http://wiki.gridengine.info/wiki/index.php/StephansBlog</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">To proof-of-concept, I did NOT create a new plugin, but modified the sge plugin instead (the sge.py template and sge.py plugin code) -- so probably not a great
solution in the long run -- but it works as expected. Feel free to create your own plugin from these mods? It would be cool if this was in starcluster already (or the seq# bit), so that users only need to modify their scheduler config to force this ‘fill up’
behavior.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">$ diff templates/sge.py.dist templates/sge.py<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">88a89,100<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> sge_exec_template = """<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> hostname %s<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> load_scaling NONE<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> complex_values slots=%s<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> user_lists NONE<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> xuser_lists NONE<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> projects NONE<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> xprojects NONE<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> usage_scaling NONE<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> report_variables NONE<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> """<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">$ diff plugins/sge.py.dist plugins/sge.py<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">106a107,111<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> master = self._master<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> execconf = master.ssh.remote_file("/tmp/execconf.txt", "w")<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> execconf.write(sge.sge_exec_template % (node.alias, num_slots))<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> execconf.close()<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">> master.ssh.execute('qconf -Me %s' % execconf.name)</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">For it to work, SGE needs scheduler conf adjusted as well (qconf -msconf), didn’t do that in StarCluster, as this is just a proof-of-concept and the master
stays up anyway:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">algorithm default<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">schedule_interval 0:2:0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">maxujobs 0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">queue_sort_method load<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">job_load_adjustments NONE<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">load_adjustment_decay_time 0:0:0<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">load_formula slots<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">schedd_job_info true<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">flush_submit_sec 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Courier New";color:#C55A11;mso-style-textfill-fill-color:#C55A11;mso-style-textfill-fill-alpha:100.0%">flush_finish_sec 1<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Cheers,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">-Hugh<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:11.0pt;font-family:"Calibri","sans-serif"">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri","sans-serif""> starcluster-bounces@mit.edu [mailto:starcluster-bounces@mit.edu]
<b>On Behalf Of </b>Rayson Ho<br>
<b>Sent:</b> Thursday, May 29, 2014 7:54 AM<br>
<b>To:</b> David Mrva<br>
<b>Cc:</b> starcluster@mit.edu<br>
<b>Subject:</b> Re: [StarCluster] minimal cost with loadbalance<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">You can set the Grid Engine "queue_sort_method" parameter to "seq_no" in sched_conf:<br>
<br>
<a href="http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html">http://gridscheduler.sourceforge.net/htmlman/htmlman5/sched_conf.html</a><o:p></o:p></p>
</div>
<p class="MsoNormal">And for this to work, we need each instance to have a different "seq_no", so a small StarCluster plugin will need to be developed -- ie. the plugin will assign a new seq_no when an instance gets created.<o:p></o:p></p>
<div>
<div>
<div>
<div>
<div>
<p class="MsoNormal"><br clear="all">
<o:p></o:p></p>
<div>
<p class="MsoNormal">Rayson<br>
<br>
==================================================<br>
Open Grid Scheduler - The Official Open Source Grid Engine<br>
<a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
<a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Thu, May 29, 2014 at 3:10 AM, David Mrva <<a href="mailto:davidm@cantabresearch.com" target="_blank">davidm@cantabresearch.com</a>> wrote:<o:p></o:p></p>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal">Hello,<br>
<br>
I stared using StarCluster with Amazon spot instances. I expect that the<br>
workload of my application will fluctuate a lot and I aim to minimise<br>
the cost of running the spot instances. StarCluster's loadbalancer seems<br>
to go some way in this direction. It adds more spot instances when the<br>
SGE queue is busy and removes unused nodes. The removal of the nodes<br>
interacts with SGE's strategy for assigning jobs to queues. SGE chooses<br>
the node with the lowest load average to assign a job to. If there are<br>
more nodes in the cluster than are necessary to execute the jobs, this<br>
strategy will result in spreading the jobs that need to be executed<br>
across as many nodes as possible. This behaviour reduces the chances of<br>
some of the nodes staying unused and potentially being removed by the<br>
load balancer.<br>
<br>
I'd like to configure StarCluster in such a way that SGE jobs go to node<br>
A for as long as there are slots available on it and they go to node B<br>
only if there is no vacant slot on node A. For example, on a cluster<br>
with nodes A and B and 8 slots on each node if there are 4 slots being<br>
used on node A and 4 more jobs arrive to SGE, I'd like all 4 of these<br>
new jobs to go node A. Using the "orte" parallel environment with<br>
"fill_up" allocation strategy does not achieve this. For the above<br>
example, using the "fill_up" allocation strategy will pick node B<br>
(lowest load average node) and assign all 4 new jobs to it, resulting in<br>
nodes A and B running 4 jobs each instead of A running 8 jobs and B none.<br>
<br>
How can I use StarCluster's built-in load balancer to minimise the cost<br>
of running spot instances by minimising the number unused CPUs in the<br>
way described above?<br>
<br>
Many thanks,<br>
David<br>
_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><o:p></o:p></p>
</blockquote>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>