<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div>Hi Gordon,&nbsp;</div><div><br></div><div>Starting a 100 nodes cluster it takes 30 minutes (and 1 hour with 200). Using a EBS backed AMI the machines boot time is very short less than 1 minute and above all constant (does not increment increasing the number of requested instances).&nbsp;</div><div><br></div><div>So all the time is spend in to configure the cluster.&nbsp;</div><div><br></div><div>StarCluster do a lot of tasks automatically (and for this reason I love it!).&nbsp;</div><div><br></div><div>But saving the state for a configured cluster, another cluster instance could be deployed updating only the /etc/hosts files and the SGE queue configuration. This would reduce a lot the total amount of time required to start.&nbsp;</div><div><br></div><div>Does it make sense ?</div></div><div><br></div><div><br></div><div>Cheers,</div><div>Paolo</div><div><br></div><div><br></div><div><br></div><div><br></div><div><div><div>On Oct 28, 2011, at 4:24 PM, Mark Gordon wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hi Paolo:<br><br>I wonder, what percentage of the launch time do you think is spend configuring the nodes?<br><br>cheers,<br>Mark<br><br><br><div class="gmail_quote">On Fri, Oct 28, 2011 at 4:57 AM, Paolo Di Tommaso <span dir="ltr">&lt;<a href="mailto:Paolo.DiTommaso@crg.eu">Paolo.DiTommaso@crg.eu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Dear All,<br>

<br>

I'm still struggling with this problem with large cluster that requires so long time to be launched.<br>

<br>

I think that some improvements are possible having a better multithread handling, but I'm not a Python guru, so I cannot say about that in details.<br>

<br>

Anyway I'm looking for a more "radical" approach. My idea is to launch a 2-node cluster, save the master and slave nodes as two separate AMIs and use these to deploy a cluster of any size without having to install and configure everything from scratch (NFS, SGE, password less access, etc) but modifying only what is changed.<br>


<br>

<br>

So my questions is: which are the "delta" in the configuration files between two different cluster instances of X and Y nodes ?<br>

<br>

Knowing this it could be quite easy write a StarCluster plugin that will apply only these changes, achieving a much more faster launch time.<br>

<br>

<br>

Thank you,<br>

<div class="im"><br>

Paolo Di Tommaso<br>

Software Engineer<br>

Comparative Bioinformatics Group<br>

Centre de Regulacio Genomica (CRG)<br>

Dr. Aiguader, 88<br>

08003 Barcelona, Spain<br>

<br>

<br>

<br>

<br>

</div><div><div></div><div class="h5">On Oct 20, 2011, at 9:48 PM, Rayson Ho wrote:<br>

<br>

&gt; ----- Original Message -----<br>

&gt;&gt; However, if one can wrap around the real<br>

&gt; ssh with a fake ssh script that sleeps 30 seconds and then runs the<br>

&gt; real<br>

&gt;&gt; ssh, then we can see how good (or bad) the Workerpool handles long<br>

&gt; latency commands - and we will start from<br>

&gt;&gt; there to optimize the launch<br>

&gt; performance.<br>

&gt;<br>

&gt; Replying to myself - after quickly reading the code...<br>

&gt;<br>

&gt; StarCluster uses Paramiko instead of executing ssh, so wrapping around a long latency ssh script won't work.<br>

&gt;<br>

&gt; And there are quite a lot of discussions about issues with multithreaded programs that call Paramiko -- just google: Paramiko+multithreading<br>

&gt;<br>

&gt;<br>

&gt; Rayson<br>

&gt;<br>

&gt; =================================<br>

&gt; Grid Engine / Open Grid Scheduler<br>

&gt; <a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net</a><br>

&gt; _______________________________________________<br>

&gt; StarCluster mailing list<br>

&gt; <a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>

&gt; <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

<br>

<br>

_______________________________________________<br>

StarCluster mailing list<br>

<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>

<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

<br>

</div></div></blockquote></div><br><br clear="all"><br>-- <br><span style="font-family:'Bitstream Charter';font-size:medium"><pre><span style="font-family:arial;white-space:normal;font-size:small"><pre><span style="font-family:arial;white-space:normal">Mark Gordon</span></pre>

</span></pre></span><div>Systems Analyst<br>Department of Physics<br>University of Alberta<br><br>This

 communication is intended for the use of the recipient to which it is 

addressed and may contain confidential, personal and/or privileged 

information. Please contact us immediately if you are not the intended 

recipient of this communication. If you are not the intended recipient 

of this communication do not copy, distribute or take action on it. Any 

communication received in error, or subsequent reply, should be deleted 

or destroyed.</div><br>

</blockquote></div><br></div></body></html>