[StarCluster] Fwd: Recover cluster after an error when starting
Milton Pividori
miltondp at gmail.com
Tue Nov 5 12:03:55 EST 2013
Sorry, I forgot to include the list.
---------- Forwarded message ----------
From: Milton Pividori <miltondp at gmail.com>
Date: 2013/11/5
Subject: Re: [StarCluster] Recover cluster after an error when starting
To: "MacMullan, Hugh" <hughmac at wharton.upenn.edu>
Thank you Hugh, I just discovered what "restart" does. I will try it next
time.
However, what I did now was to increase the timeout for mount in the
file starcluster/node.py, in the mount_nfs_shares function, line 725 (I am
using StarCluster 0.94.2). I added the option "timeo=20", and it worked.
Maybe it would be good to have a "timeout" option in the config file.
Thank you again!
2013/11/5 MacMullan, Hugh <hughmac at wharton.upenn.edu>
Hi Milton:
>
>
>
> I would generally do a restart (starcluster restart mycluster).
>
>
>
> -Hugh
>
>
>
> *From:* starcluster-bounces at mit.edu [mailto:starcluster-bounces at mit.edu] *On
> Behalf Of *Milton Pividori
> *Sent:* Tuesday, November 05, 2013 11:50 AM
> *To:* starcluster at mit.edu
> *Subject:* [StarCluster] Recover cluster after an error when starting
>
>
>
> Hi all,
>
>
>
> I am a new user of StarCluster. First of all, thank you for this great
> software!
>
>
>
> My question is about how to recover a cluster when there was an error in
> starting it. After I ran "starcluster start mycluster" I got a timeout
> error when mounting the /home directory (EBS volume). Is it possible to run
> the plugin again? In this case, I think the plugin is
> "starcluster.clustersetup.DefaultClusterSetup".
>
>
>
> This is the last part of the error I get (the cluster size is 10 with
> t1.micro):
>
>
>
> >>> Starting NFS server on master
>
> >>> Configuring NFS exports path(s):
>
> /home
>
> >>> Mounting all NFS export path(s) on 9 worker node(s)
>
> 9/9 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
>
> !!! ERROR - Error occured while running plugin
> 'starcluster.clustersetup.DefaultClusterSetup':
>
> !!! ERROR - error occurred in job (id=node009): remote command 'source
> /etc/profile && mount /home' failed with status 32:
>
> mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
>
> Traceback (most recent call last):
>
> File
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py",
> line 48, in run
>
> job.run()
>
> File
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/threadpool.py",
> line 75, in run
>
> r = self.method(*self.args, **self.kwargs)
>
> File
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/node.py",
> line 731, in mount_nfs_shares
>
> self.ssh.execute('mount %s' % path)
>
> File
> "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94.2-py2.7.egg/starcluster/sshutils/__init__.py",
> line 555, in execute
>
> msg, command, exit_status, out_str)
>
> RemoteCommandFailed: remote command 'source /etc/profile && mount /home'
> failed with status 32:
>
> mount.nfs: mount to NFS server 'master:/home' failed: timed out, giving up
>
>
>
> Thank you!
>
>
>
> --
> Milton Pividori
> Blog: www.miltonpividori.com.ar
>
--
Milton Pividori
Blog: www.miltonpividori.com.ar
--
Milton Pividori
Blog: www.miltonpividori.com.ar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20131105/7eede652/attachment-0001.htm
More information about the StarCluster
mailing list