[StarCluster] Recover cluster after an error when starting
MacMullan, Hugh
hughmac at wharton.upenn.edu
Tue Nov 5 11:55:17 EST 2013
Hi Milton:
I would generally do a restart (starcluster restart mycluster).
-Hugh
From: starcluster-bounces at mit.edu [mailto:starcluster-bounces at mit.edu] On Behalf Of Milton Pividori
Sent: Tuesday, November 05, 2013 11:50 AM
To: starcluster at mit.edu
Subject: [StarCluster] Recover cluster after an error when starting
Hi all,
I am a new user of StarCluster. First of all, thank you for this great software!
My question is about how to recover a cluster when there was an error in starting it. After I ran "starcluster start mycluster" I got a timeout error when mounting the /home directory (EBS volume). Is it possible to run the plugin again? In this case, I think the plugin is "starcluster.clustersetup.DefaultClusterSetup".
This is the last part of the error I get (the cluster size is 10 with t1.micro):
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home
>>> Mounting all NFS export path(s) on 9 worker node(s)
9/9 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
!!! ERROR - Error occured while running plugin 'starcluster.clustersetup.DefaultClusterSetup':
!!! ERROR - error occurred in job (id=node009): remote command 'source /etc/profile && mount /home' failed with status 32:
mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py", line 48, in run
job.run()
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py", line 75, in run
r = self.method(*self.args, **self.kwargs)
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/node.py", line 731, in mount_nfs_shares
self.ssh.execute('mount %s' % path)
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/sshutils/__init__.py", line 555, in execute
msg, command, exit_status, out_str)
RemoteCommandFailed: remote command 'source /etc/profile && mount /home' failed with status 32:
mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
Thank you!
--
Milton Pividori
Blog: www.miltonpividori.com.ar<http://www.miltonpividori.com.ar>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20131105/b56338be/attachment.htm
More information about the StarCluster
mailing list