[StarCluster] Recover cluster after an error when starting

MacMullan, Hugh hughmac at wharton.upenn.edu
Tue Nov 5 11:55:17 EST 2013


Hi Milton:

I would generally do a restart (starcluster restart mycluster).

-Hugh

From: starcluster-bounces at mit.edu [mailto:starcluster-bounces at mit.edu] On Behalf Of Milton Pividori
Sent: Tuesday, November 05, 2013 11:50 AM
To: starcluster at mit.edu
Subject: [StarCluster] Recover cluster after an error when starting

Hi all,

I am a new user of StarCluster. First of all, thank you for this great software!

My question is about how to recover a cluster when there was an error in starting it. After I ran "starcluster start mycluster" I got a timeout error when mounting the /home directory (EBS volume). Is it possible to run the plugin again? In this case, I think the plugin is "starcluster.clustersetup.DefaultClusterSetup".

This is the last part of the error I get (the cluster size is 10 with t1.micro):

>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home
>>> Mounting all NFS export path(s) on 9 worker node(s)
9/9 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
!!! ERROR - Error occured while running plugin 'starcluster.clustersetup.DefaultClusterSetup':
!!! ERROR - error occurred in job (id=node009): remote command 'source /etc/profile && mount /home' failed with status 32:
mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py", line 48, in run
    job.run()
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py", line 75, in run
    r = self.method(*self.args, **self.kwargs)
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/node.py", line 731, in mount_nfs_shares
    self.ssh.execute('mount %s' % path)
  File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/sshutils/__init__.py", line 555, in execute
    msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && mount /home' failed with status 32:
mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up

Thank you!

--
Milton Pividori
Blog: www.miltonpividori.com.ar<http://www.miltonpividori.com.ar>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20131105/b56338be/attachment.htm


More information about the StarCluster mailing list