[Starcluster] Starcluster hangs at Creating Cluster User

Justin Riley jtriley at MIT.EDU
Fri Apr 16 09:59:58 EDT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Dan,

Just following up for others who might be interested. This is due to
StarCluster chmod'ing recursively all mounted EBS volumes to be owned by
cluster_user. Unfortunately in the case of many GB of data, this takes
*forever*.

I'm working on a solution to this.

~Justin

On 04/15/2010 02:18 PM, Dan Yamins wrote:
> Hi,
> 
> I'm using Starcluster from the git repo.   I think I have everything
> configured properly.     But when I try to a 1-node cluster, the process
> hangs at the "create user" step:
> 
>>>> Validating cluster settings...
>>>> Cluster settings are valid
>>>> Starting cluster...
>>>> Launching a 1-node cluster...
>>>> Launching master node...
>>>> Master AMI: ami-a19e71c8
>>>> Creating security group @sc-testcluster...
> Reservation:r-56c3ca3e
>>>> Waiting for cluster to start.../>>> The master node is
> ec2-184-73-33-230.compute-1.amazonaws.com
> <http://ec2-184-73-33-230.compute-1.amazonaws.com>
>  
>>>> Attaching volume vol-c3d927aa to master node...
>>>> Setting up the cluster...
>>>> Mounting EBS volume vol-c3d927aa on /home...
>>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)
>>>> Creating cluster user: gotdata
> 
> ... and that's where it hangs.
> 
> I CAN log into the individual nodes -- both as master AND as "gotdata"
> -- using passwordless ssh.   Here's what the /etc/hosts file looks like:
> 
> 127.0.0.1 localhost.localdomain localhost
> 
> # The following lines are desirable for IPv6 capable hosts
> ::1 ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
> 
> Since this is a 1-node cluster, I can't test the passwordless login.
> 
> I can reproduce this problem both with both the 32-bit and 64-bit base
> starcluster AMIs as well as the AMIs that I created from those.
> 
> When I try to create a 2-node cluster, the process hangs a step later:
> 
>>>> Validating cluster settings...
>>>> Cluster settings are valid
>>>> Starting cluster...
>>>> Launching a 2-node cluster...
>>>> Launching master node...
>>>> Master AMI: ami-f129c798
>>>> Creating security group @sc-testcluster...
> Reservation:r-e8d9d080
>>>> Launching worker nodes...
>>>> Node AMI: ami-f129c798
> Reservation:r-ead9d082
>>>> Waiting for cluster to start...
>>>> The master node is ec2-184-73-111-239.compute-1.amazonaws.com
> <http://ec2-184-73-111-239.compute-1.amazonaws.com>
>>>> Attaching volume vol-c3d927aa to master node...
>>>> Setting up the cluster...
>>>> Mounting EBS volume vol-c3d927aa on /home...
>>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)
>>>> Creating cluster user: gotdata
>>>> Using private key /Users/danielyamins/amazon/id_rsa-gsg-keypair (rsa)
> 
> .... and there it hangs. 
> 
> In this case, I can:
>   -- log into the master  and worker nodes as root:  e.g.  "starcluster
> sshmaster testcluster" and "starcluster sshnode testcluster 1" work fine
>   -- log into the master as user gotdata, but NOT into the other worker
> node, e.g. "starcluster sshnode -u gotdata testcluster 0" works but
> "starclsuter sshnode -u gotdata testcluster 1" DOESN'T.
> 
> 
> Thanks!
> Dan
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Starcluster mailing list
> Starcluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkvIbV4ACgkQ4llAkMfDcrn0UACfaMwr2DJ+vqvQXwZvHnTp3EJF
OmMAn1jp+ySTlRUftkZRarEEiig9ZxMo
=Wy36
-----END PGP SIGNATURE-----



More information about the StarCluster mailing list