[StarCluster] Issue creating a cluster of 30 nodes with starcluster

Justin Riley jtriley at MIT.EDU
Wed Nov 9 01:45:42 EST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sumita,

Unless you've specifically submitted a request to Amazon to increase
your instance limit[1] I can't help but suspect that you're likely
running into issues because of the default 20 instance limit for
flat-rate instances I mentioned earlier.

I would recommend trying with spot instances[2]; they're usually
cheaper than the flat-rate(s) AND you can launch up to 100 of them. To
request a spot cluster just pass the --bid option to the start command:

$ starcluster start --bid 0.50 mycluster

This will place a $0.50 spot bid on each node in the cluster except
for the master. The master node is always launched as a flat-rate
instance for stability.

To help you decide a decent spot bid use the spot history command:

$ starcluster spothistory m1.large

With that said you can check which nodes have SSH up using:

$ starcluster listclusters --show-ssh-status

Also, you can *always* restart and reboot all nodes in the cluster and
completely reconfigure the cluster using the "restart" commmand:

$ starcluster restart mycluster

HTH,

~Justin

[1] http://aws.amazon.com/contact-us/ec2-request/
[2] http://aws.amazon.com/ec2/spot-instances/

On 11/08/2011 07:20 PM, Sumita Sinha wrote:
> Hi Justin,
> 
> I again tried creating 30 nodes cluster and figured out something 
> new. I am waiting for last 20 min for the cluster to be up. I get 
> the below message. Currently in EC2 all the nodes are up and 
> running ,i don't know which node is taking time for SSH 
> configuration. so i am not able to restart or terminate a node.
> 
>>>> Using default cluster template: smallcluster Validating 
>>>> cluster template settings... Cluster template settings are 
>>>> valid Starting cluster... Launching a 30-node cluster... 
>>>> Creating security group @sc-smallcluster...
> Reservation:r-0e2d7060
>>>> Waiting for cluster to come up... (updating every 30s) 
>>>> Waiting for all nodes to be in a 'running' state...
> 29/29 
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
>  100%
>>>> Waiting for SSH to come up on all nodes...
> 28/29 
> |-------------------------------------------------------------   |
>  96%
> 
> 
> Regards Sumita
> 
> On Tue, Nov 8, 2011 at 7:42 PM, Justin Riley <jtriley at mit.edu 
> <mailto:jtriley at mit.edu>> wrote:
> 
> 
> Hi Sumita,
> 
> Were you using spot instances? If not I believe there's a default 
> limit of 20 instances by default for flat-rate instances which 
> *could* be related to your issue. With spot instances you can 
> create up to 100 instances by default. So, if you need more than
> 20 nodes and do not wish to submit a request to Amazon to increase 
> your flat-rate instance limit, you should be using spot instances:
> 
> $ starcluster start -s 30 -b 0.50 mycluster
> 
> With that said, StarCluster has no limit to the number of nodes you
> can create, however, as you've seen, sometimes EC2 instances can
> take longer to become 'running' than usual. Unfortunately this is
> purely an EC2 back-end issue that cannot be resolved directly by
> StarCluster. In my experience 22 minutes *is* quite a while to wait
> for any instance to come up, however, I have had instances take up
> to 15 min before in the past so this is not a total surprise to
> me.
> 
> In the future if you run into this problem of waiting for an 
> instance to change from 'pending' to 'running' for too long (e.g. 
> 15min+) I would recommend simply terminating the faulty instance 
> from the AWS console and then restart the cluster using:
> 
> $ starcluster restart mycluster
> 
> This should reboot all the currently running instances and begin 
> configuring the cluster and avoid having to terminate the entire 
> cluster and lose instance hours.
> 
> HTH,
> 
> ~Justin
> 
> 
> On 11/8/11 6:39 AM, Sumita Sinha wrote:
>> Hello ,
> 
>> Currently working with starcluster on EC2.
> 
>> Tried creating a cluster with 30 nodes of type m1.small using
>> AMI -
> ami-8cf913e5.
>> Cluster creation was never completed as i found out that one 
>> node
> node025 was showing pending status.
>> I waited for almost 22 minutes then terminated the cluster. 
>> Cluster was terminated properly. Is there any limit to the 
>> creation
> of nodes .
> 
> 
> 
> 
>> -- Regards Sumita Sinha
> 
> 
> 
> 
> 
> 
> 
> -- Regards Sumita Sinha
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk66IZYACgkQ4llAkMfDcrmZ5ACeIPTP8ZiFKTlTNxif6SgIKsWm
SmoAnA08GWFcOcmpCF+MMHwLzhqzD0Va
=KFye
-----END PGP SIGNATURE-----



More information about the StarCluster mailing list