[StarCluster] Large cluster (125 nodes) launch failure

Kyeong Soo (Joseph) Kim kyeongsoo.kim at gmail.com
Tue Mar 15 17:29:10 EDT 2011


Hi Justin and All,

This is to report a failure in launching a large cluster with 125
nodes (c1.xlarge).

I tried to launch the said cluster two times but starcluster hung (for
more than hours) at the following steps:

.....

>>> Launching node121 (ami: ami-2857a641, type: c1.xlarge)
>>> Launching node122 (ami: ami-2857a641, type: c1.xlarge)
>>> Launching node123 (ami: ami-2857a641, type: c1.xlarge)
>>> Launching node124 (ami: ami-2857a641, type: c1.xlarge)
>>> Creating security group @sc-hnrlcluster...
Reservation:r-7c264911
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
125/125 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
125/125 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> The master node is ec2-75-101-230-197.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Attaching volume vol-467ecc2e to master node on /dev/sdz ...
>>> Configuring hostnames...
>>> Mounting EBS volume vol-467ecc2e on /home...
>>> Creating cluster user: kks (uid: 1001, gid: 1001)
>>> Configuring scratch space for user: kks
>>> Configuring /etc/hosts on each node


I have succeeded with the configuration up to 15 nodes so far.

Any idea?

With Regards,
Joseph
--
Kyeong Soo (Joseph) Kim, Ph.D.
Senior Lecturer in Networking
Room 112, Digital Technium
Multidisciplinary Nanotechnology Centre, College of Engineering
Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
TEL: +44 (0)1792 602024
EMAIL: k.s.kim_at_swansea.ac.uk
HOME: http://iat-hnrl.swan.ac.uk/ (group)
            http://iat-hnrl.swan.ac.uk/~kks/ (personal)




More information about the StarCluster mailing list