<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Sumita,&nbsp;<div><br></div><div>EBS instances are faster to boot by design. See here&nbsp;<a href="http://goo.gl/LOqgb">http://goo.gl/LOqgb</a></div><div><br></div><div>I had a problem similar to your and I've solved updating to the latest version and using EBS instances.&nbsp;</div><div><br></div><div>About your question, I'm not a StarCluster developer so I cannot say for sure, but I think it needs that all nodes are up because to configure the cluster you need to know the IP addresses of all nodes, &nbsp;as well as all nodes need to know the IP of the others (to set up /etc/hosts for example).&nbsp;</div><div><br></div><div>Said that, in my opinion the current bottleneck with StarCluster is that it configures nodes in a serial way, one after another (or at lest so it seems looking at benchmark result), and this create a huge problem if you want to use it to deploy large clusters (with 50 or more nodes).&nbsp;</div><div><br></div><div><br></div><div>Cheers,</div><div>Paolo</div><div><br></div><div><br></div><div><br></div><div><div><div>On Nov 9, 2011, at 1:00 AM, Sumita Sinha wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="content-type" content="text/html; charset=utf-8">Thanks for the quick response.<div><br></div><div><br></div><div>I tried with instance-store <a href="http://instances.Is">instances.Is</a> there any reason that EBS backed instances take less time to boot.</div>

<div><br></div><div>I tried creating the &nbsp;cluster again with 30 nodes, this time it was successfully done in 14min</div><div><br></div><div>When a cluster create request is sent i see that the message on the terminal</div>

<meta http-equiv="content-type" content="text/html; charset=utf-8"><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px; ">&gt;&gt;&gt;</span><span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px; ">Waiting for all nodes to be in a 'running' state...</span></div>

<div><span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px; "><meta http-equiv="content-type" content="text/html; charset=utf-8">&gt;&gt;&gt; Waiting for SSH to come up on all nodes...</span></div>

<div><span class="Apple-style-span" style="border-collapse: collapse; font-family: arial, sans-serif; font-size: 13px; "><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>&gt;&gt;&gt; Setting up the cluster...</div>

<div>&gt;&gt;&gt; Configuring hostnames...</div><div><meta http-equiv="content-type" content="text/html; charset=utf-8">&gt;&gt;&gt; Creating cluster user: sgeadmin (uid: 1001, gid: 1001)</div><div><br></div><div>So when any node is up and running in EC2, does starcluster wait for all the nodes to be up and then it starts configuring them all at one time.</div>

<div>Is there any parameter in the config file or any options in the starcluster start command that says "<font class="Apple-style-span" color="#000099">configuration of the cluster and installing SGE/Configuring NFS &nbsp;to be a parallel operation. any node should not wait for the other nodes to be up for getiing configured that's if we post a job on that ready node it should start executing the job with the available no of nodes that are running and configured</font>."</div>

<div><br></div><div>If the above is not possible &nbsp;, is there any specific reason while starting a cluster, starcluster does the configuration of nodes only when all are running.</div><div>If anything bad happens at the EC2 level and some of the nodes are taking a lot of time to start, is there any "fault tolerant technique" or "time out" .</div>

<div><br></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div>Regards</div><div>Sumita</div></div></span></div><div><br></div><div><br></div><div><br></div><div></div><br><div class="gmail_quote">

On Tue, Nov 8, 2011 at 7:55 PM, Paolo Di Tommaso <span dir="ltr">&lt;<a href="mailto:Paolo.DiTommaso@crg.eu">Paolo.DiTommaso@crg.eu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div style="word-wrap:break-word"><div><div>Are you using instance-store instance or EBS backed instances?</div><div><br></div><div>The latter are much more faster to boot.&nbsp;</div><div><br></div><div><br></div><div>Cheers,</div>

<div>Paolo</div><div><div></div><div class="h5"><div><br></div><div><br></div><div><br></div><div>On Nov 8, 2011, at 3:12 PM, Justin Riley wrote:</div><br></div></div><blockquote type="cite"><div><div></div><div class="h5">

  <div bgcolor="#FFFFFF" text="#000000">

    <br>

    -----BEGIN PGP SIGNED MESSAGE-----<br>

    Hash: SHA1<br>

    <br>

    Hi Sumita,<br>

    <br>

    Were you using spot instances? If not I believe there's a default

    limit of 20 instances by default for flat-rate instances which

    *could* be related to your issue. With spot instances you can create

    up to 100 instances by default. So, if you need more than 20 nodes

    and do not wish to submit a request to Amazon to increase your

    flat-rate instance limit, you should be using spot instances:<br>

    <br>

    $ starcluster start -s 30 -b 0.50 mycluster<br>

    <br>

    With that said, StarCluster has no limit to the number of nodes you

    can create, however, as you've seen, sometimes EC2 instances can

    take longer to become 'running' than usual. Unfortunately this is

    purely an EC2 back-end issue that cannot be resolved directly by

    StarCluster. In my experience 22 minutes *is* quite a while to wait

    for any instance to come up, however, I have had instances take up

    to 15 min before in the past so this is not a total surprise to me.<br>

    <br>

    In the future if you run into this problem of waiting for an

    instance to change from 'pending' to 'running' for too long (e.g.

    15min+) I would recommend simply terminating the faulty instance

    from the AWS console and then restart the cluster using:<br>

    <br>

    $ starcluster restart mycluster<br>

    <br>

    This should reboot all the currently running instances and begin

    configuring the cluster and avoid having to terminate the entire

    cluster and lose instance hours.<br>

    <br>

    HTH,<br>

    <br>

    ~Justin<br>

    <br>

    On 11/8/11 6:39 AM, Sumita Sinha wrote:<br>

    <span style="white-space:pre-wrap">&gt; Hello ,<br>

      &gt;<br>

      &gt; Currently working with starcluster on EC2.<br>

      &gt;<br>

      &gt; Tried creating a cluster with 30 nodes of type m1.small using

      AMI - ami-8cf913e5.<br>

      &gt; Cluster creation was never completed as i found out that one

      node node025 was showing pending status.<br>

      &gt; I waited for almost 22 minutes then terminated the cluster.<br>

      &gt; Cluster was terminated properly. Is there any limit to the

      creation of nodes .<br>

      &gt;<br>

      &gt;<br>

      &gt;<br>

      &gt;<br>

      &gt; -- <br>

      &gt; Regards<br>

      &gt; Sumita Sinha<br>

      &gt;<br>

      &gt;</span><br>

    <br>

    -----BEGIN PGP SIGNATURE-----<br>

    Version: GnuPG v1.4.11 (Darwin)<br>

    Comment: Using GnuPG with Mozilla - <a href="http://enigmail.mozdev.org/" target="_blank">http://enigmail.mozdev.org/</a><br>

    <br>

    iEYEARECAAYFAk65OL4ACgkQ4llAkMfDcrm9MACghU/Ey4v653fsD8XmpbQKONNp<br>

    vdkAniIfFExWjqGAOWRolMrtePHfl4AL<br>

    =Q8NI<br>

    -----END PGP SIGNATURE-----<br>

    <br>

  </div></div></div>

_______________________________________________<br>StarCluster mailing list<br><a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br><a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

</blockquote></div><br></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Regards<br>Sumita Sinha<br><br><br>

</blockquote></div><br></div></body></html>