[Starcluster] cluster start-up hangs when starting Sun Grid Engine

Damian Eads eads at soe.ucsc.edu
Fri Apr 16 14:53:47 EDT 2010


Hi,

I reserved three c1.xlarge instances (24 cores, 3 nodes, 8 cores per
node), one master and two workers all in the same availability group
(us-east-1a). Installing the Sun Grid Engine hangs for a very long
time. I terminated and tried again, no avail. The third time, when I
tried logging in, my EBS volume mounted with mount point /data isn't
visible over NFS on all worker nodes.

eads at argentina:~/work/repo/StarCluster$ starcluster start -x mycluster dtest
/tmp/qqq/lib/python2.6/site-packages/pycrypto-2.0.1-py2.6-linux-i686.egg/Crypto/Hash/SHA.py:6:
DeprecationWarning: the sha module is deprecated; use the hashlib
module instead
/tmp/qqq/lib/python2.6/site-packages/pycrypto-2.0.1-py2.6-linux-i686.egg/Crypto/Hash/MD5.py:6:
DeprecationWarning: the md5 module is deprecated; use hashlib instead
/var/lib/python-support/python2.6/IPython/Magic.py:38:
DeprecationWarning: the sets module is deprecated
  from sets import Set
StarCluster - (http://web.mit.edu/starcluster)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster at mit.edu

>>> Validating cluster settings...
>>> Cluster settings are valid
>>> Starting cluster...
>>> Waiting for cluster to start...
>>> The master node is ec2-184-73-86-18.compute-1.amazonaws.com
>>> Attaching volume vol-c5e85dac to master node...
>>> Setting up the cluster...
>>> Mounting EBS volume vol-c5e85dac on /data...
ssh.py:65 - WARNING - specified key does not end in either rsa or dsa,
trying both
>>> Using private key /home/eads/deadskey.pem (rsa)
>>> Creating cluster user: sgeadmin
ssh.py:65 - WARNING - specified key does not end in either rsa or dsa,
trying both
>>> Using private key /home/eads/deadskey.pem (rsa)
ssh.py:65 - WARNING - specified key does not end in either rsa or dsa,
trying both
>>> Using private key /home/eads/deadskey.pem (rsa)
>>> Configuring scratch space for user: sgeadmin
>>> Configuring /etc/hosts on each node
>>> Configuring NFS...
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for user: sgeadmin
>>> Using existing RSA ssh keys found for user: sgeadmin
>>> Installing Sun Grid Engine...

It hangs on this step. I've reproduced this three times. Any ideas?

Thanks a lot in advance!

Damian



-----------------------------------------------------
Damian Eads                           Ph.D. Candidate
University of California             Computer Science
1156 High Street         Machine Learning Lab, E2-489
Santa Cruz, CA 95064    http://www.soe.ucsc.edu/~eads



More information about the StarCluster mailing list