[Starcluster] Starcluster hangs at Creating Cluster User

Dan Yamins dyamins at gmail.com
Thu Apr 15 14:18:08 EDT 2010


Hi,

I'm using Starcluster from the git repo.   I think I have everything
configured properly.     But when I try to a 1-node cluster, the process
hangs at the "create user" step:

>>> Validating cluster settings...
>>> Cluster settings are valid
>>> Starting cluster...
>>> Launching a 1-node cluster...
>>> Launching master node...
>>> Master AMI: ami-a19e71c8
>>> Creating security group @sc-testcluster...
Reservation:r-56c3ca3e
>>> Waiting for cluster to start.../>>> The master node is
ec2-184-73-33-230.compute-1.amazonaws.com

>>> Attaching volume vol-c3d927aa to master node...
>>> Setting up the cluster...
>>> Mounting EBS volume vol-c3d927aa on /home...
>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)
>>> Creating cluster user: gotdata

... and that's where it hangs.

I CAN log into the individual nodes -- both as master AND as "gotdata" --
using passwordless ssh.   Here's what the /etc/hosts file looks like:

127.0.0.1 localhost.localdomain localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Since this is a 1-node cluster, I can't test the passwordless login.

I can reproduce this problem both with both the 32-bit and 64-bit base
starcluster AMIs as well as the AMIs that I created from those.

When I try to create a 2-node cluster, the process hangs a step later:

>>> Validating cluster settings...
>>> Cluster settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Launching master node...
>>> Master AMI: ami-f129c798
>>> Creating security group @sc-testcluster...
Reservation:r-e8d9d080
>>> Launching worker nodes...
>>> Node AMI: ami-f129c798
Reservation:r-ead9d082
>>> Waiting for cluster to start...
>>> The master node is ec2-184-73-111-239.compute-1.amazonaws.com
>>> Attaching volume vol-c3d927aa to master node...
>>> Setting up the cluster...
>>> Mounting EBS volume vol-c3d927aa on /home...
>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)
>>> Creating cluster user: gotdata
>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)

.... and there it hangs.

In this case, I can:
  -- log into the master  and worker nodes as root:  e.g.  "starcluster
sshmaster testcluster" and "starcluster sshnode testcluster 1" work fine
  -- log into the master as user gotdata, but NOT into the other worker
node, e.g. "starcluster sshnode -u gotdata testcluster 0" works but
"starclsuter sshnode -u gotdata testcluster 1" DOESN'T.


Thanks!
Dan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20100415/1abb5743/attachment.htm


More information about the StarCluster mailing list