[StarCluster] Issue creating a cluster of 30 nodes with starcluster
Justin Riley
jtriley at MIT.EDU
Wed Nov 9 01:45:42 EST 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Sumita,
Unless you've specifically submitted a request to Amazon to increase
your instance limit[1] I can't help but suspect that you're likely
running into issues because of the default 20 instance limit for
flat-rate instances I mentioned earlier.
I would recommend trying with spot instances[2]; they're usually
cheaper than the flat-rate(s) AND you can launch up to 100 of them. To
request a spot cluster just pass the --bid option to the start command:
$ starcluster start --bid 0.50 mycluster
This will place a $0.50 spot bid on each node in the cluster except
for the master. The master node is always launched as a flat-rate
instance for stability.
To help you decide a decent spot bid use the spot history command:
$ starcluster spothistory m1.large
With that said you can check which nodes have SSH up using:
$ starcluster listclusters --show-ssh-status
Also, you can *always* restart and reboot all nodes in the cluster and
completely reconfigure the cluster using the "restart" commmand:
$ starcluster restart mycluster
HTH,
~Justin
[1] http://aws.amazon.com/contact-us/ec2-request/
[2] http://aws.amazon.com/ec2/spot-instances/
On 11/08/2011 07:20 PM, Sumita Sinha wrote:
> Hi Justin,
>
> I again tried creating 30 nodes cluster and figured out something
> new. I am waiting for last 20 min for the cluster to be up. I get
> the below message. Currently in EC2 all the nodes are up and
> running ,i don't know which node is taking time for SSH
> configuration. so i am not able to restart or terminate a node.
>
>>>> Using default cluster template: smallcluster Validating
>>>> cluster template settings... Cluster template settings are
>>>> valid Starting cluster... Launching a 30-node cluster...
>>>> Creating security group @sc-smallcluster...
> Reservation:r-0e2d7060
>>>> Waiting for cluster to come up... (updating every 30s)
>>>> Waiting for all nodes to be in a 'running' state...
> 29/29
> ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
>>>> Waiting for SSH to come up on all nodes...
> 28/29
> |------------------------------------------------------------- |
> 96%
>
>
> Regards Sumita
>
> On Tue, Nov 8, 2011 at 7:42 PM, Justin Riley <jtriley at mit.edu
> <mailto:jtriley at mit.edu>> wrote:
>
>
> Hi Sumita,
>
> Were you using spot instances? If not I believe there's a default
> limit of 20 instances by default for flat-rate instances which
> *could* be related to your issue. With spot instances you can
> create up to 100 instances by default. So, if you need more than
> 20 nodes and do not wish to submit a request to Amazon to increase
> your flat-rate instance limit, you should be using spot instances:
>
> $ starcluster start -s 30 -b 0.50 mycluster
>
> With that said, StarCluster has no limit to the number of nodes you
> can create, however, as you've seen, sometimes EC2 instances can
> take longer to become 'running' than usual. Unfortunately this is
> purely an EC2 back-end issue that cannot be resolved directly by
> StarCluster. In my experience 22 minutes *is* quite a while to wait
> for any instance to come up, however, I have had instances take up
> to 15 min before in the past so this is not a total surprise to
> me.
>
> In the future if you run into this problem of waiting for an
> instance to change from 'pending' to 'running' for too long (e.g.
> 15min+) I would recommend simply terminating the faulty instance
> from the AWS console and then restart the cluster using:
>
> $ starcluster restart mycluster
>
> This should reboot all the currently running instances and begin
> configuring the cluster and avoid having to terminate the entire
> cluster and lose instance hours.
>
> HTH,
>
> ~Justin
>
>
> On 11/8/11 6:39 AM, Sumita Sinha wrote:
>> Hello ,
>
>> Currently working with starcluster on EC2.
>
>> Tried creating a cluster with 30 nodes of type m1.small using
>> AMI -
> ami-8cf913e5.
>> Cluster creation was never completed as i found out that one
>> node
> node025 was showing pending status.
>> I waited for almost 22 minutes then terminated the cluster.
>> Cluster was terminated properly. Is there any limit to the
>> creation
> of nodes .
>
>
>
>
>> -- Regards Sumita Sinha
>
>
>
>
>
>
>
> -- Regards Sumita Sinha
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk66IZYACgkQ4llAkMfDcrmZ5ACeIPTP8ZiFKTlTNxif6SgIKsWm
SmoAnA08GWFcOcmpCF+MMHwLzhqzD0Va
=KFye
-----END PGP SIGNATURE-----
More information about the StarCluster
mailing list