[StarCluster] ? re CreateUsers error in .95.6
Lyn Gerner
schedulerqueen at gmail.com
Fri Aug 7 19:08:44 EDT 2015
Thanks for the reply, Chris.
(Un-closing this issue; it's back.)
Nope, it was really there, with several lines in it (these are the only
users of even plausible interest).
*root at AWS-VTMXmaster-w2c **~*
*# wc -l ~/.ssh/known_hosts ~prod/.ssh/known_hosts*
* 24 /root/.ssh/known_hosts*
* 4 /home/prod/.ssh/known_hosts*
* 28 total*
*root at AWS-VTMXmaster-w2c **~*
*# > /home/prod/.ssh/known_hosts*
*root at AWS-VTMXmaster-w2c **~*
*# > /root/.ssh/known_hosts*
*Sadly:* After launching a new cluster and successfully adding a node001,
now my attempt to add a second node is again experiencing this failure.
Thanks for any more ideas.
Best,
Lyn
On Fri, Aug 7, 2015 at 12:51 PM, Christopher Clearfield <
chris.clearfield at system-logic.com> wrote:
> I wonder if, instead of zeroing-out the file, '> known_hosts' actually
> created it?
>
> I noticed the mode for opening the file originally is:
> *add_to_known_hosts*
>
> * khostsf = self.ssh.remote_file(known_hosts_file, 'a')*
> 'a', rather than 'a+', so it will fail if the file doesn't exist for some
> reasons.
>
> –
> C
>
>
> On Fri, Aug 7, 2015 at 3:47 PM Lyn Gerner <schedulerqueen at gmail.com>
> wrote:
>
>> Update/Close: Strangely, this particular issue was resolved by going to
>> the master and zeroing the known_hosts file (as in "> known_hosts").
>>
>> On Fri, Aug 7, 2015 at 11:39 AM, Lyn Gerner <schedulerqueen at gmail.com>
>> wrote:
>>
>>> Hi Developers,
>>>
>>> Sorry for the Fri afternoon query, but I'm getting an error never before
>>> seen on an addnode, and it recurs even on a -x retry. Appreciate any
>>> workaround/recovery suggestions for the following:
>>>
>>> *# sc an -x -a node002 w2c*
>>>
>>> *StarCluster - (http://star.mit.edu/cluster
>>> <http://star.mit.edu/cluster>) (v. 0.95.6)*
>>>
>>> *Software Tools for Academics and Researchers (STAR)*
>>>
>>> *Please submit bug reports to starcluster at mit.edu <starcluster at mit.edu>*
>>>
>>>
>>> *>>> Waiting for node(s) to come up... (updating every 30s)*
>>>
>>> *>>> Waiting for all nodes to be in a 'running' state...*
>>>
>>> *3/3
>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>
>>> *>>> Waiting for SSH to come up on all nodes...*
>>>
>>> *3/3
>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>
>>> *>>> Waiting for cluster to come up took 0.206 mins*
>>>
>>> *>>> Running plugin starcluster.clustersetup.DefaultClusterSetup*
>>>
>>> *>>> Configuring hostnames...*
>>>
>>> *1/1
>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>
>>> *>>> Configuring /etc/hosts on each node*
>>>
>>> *3/3
>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>
>>> *>>> Configuring NFS exports path(s):*
>>>
>>> */home /jobs/ /usr/share/jobs/ /pipe/*
>>>
>>> *>>> Mounting all NFS export path(s) on 1 worker node(s)*
>>>
>>> *1/1
>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>
>>> *>>> Setting up NFS took 0.021 mins*
>>>
>>> *1/1
>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>
>>> *>>> Configuring scratch space for user(s): sgeadmin*
>>>
>>> *1/1
>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>
>>> *>>> Configuring passwordless ssh for root*
>>>
>>> *>>> Configuring passwordless ssh for sgeadmin*
>>>
>>> *>>> Running plugin swap_addnode_w2c.VISwapConfigurator*
>>>
>>> *>>> Configuring Swap on node002*
>>>
>>> *>>> Running plugin starcluster.plugins.users.CreateUsers*
>>>
>>> *>>> Creating 1 users on node002*
>>>
>>> *>>> Adding node002 to known_hosts for 1 users*
>>>
>>> *!!! ERROR - Error occured while running plugin
>>> 'starcluster.plugins.users.CreateUsers':*
>>>
>>> *!!! ERROR - Unhandled exception occured*
>>>
>>> *Traceback (most recent call last):*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cli.py",
>>> line 274, in main*
>>>
>>> * sc.execute(args)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/commands/addnode.py",
>>> line 128, in execute*
>>>
>>> * no_create=self.opts.no_create)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cluster.py",
>>> line 189, in add_nodes*
>>>
>>> * no_create=no_create)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cluster.py",
>>> line 1042, in add_nodes*
>>>
>>> * self.run_plugins(method_name="on_add_node", node=node)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cluster.py",
>>> line 1690, in run_plugins*
>>>
>>> * self.run_plugin(plug, method_name=method_name, node=node)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cluster.py",
>>> line 1715, in run_plugin*
>>>
>>> * func(*args)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/plugins/users.py",
>>> line 164, in on_add_node*
>>>
>>> * master.add_to_known_hosts(user, [node])*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/node.py",
>>> line 578, in add_to_known_hosts*
>>>
>>> * khostsf = self.ssh.remote_file(known_hosts_file, 'a')*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/sshutils.py",
>>> line 320, in remote_file*
>>>
>>> * rfile = self.sftp.open(file, mode)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/paramiko-1.15.1-py2.6.egg/paramiko/sftp_client.py",
>>> line 327, in open*
>>>
>>> * t, msg = self._request(CMD_OPEN, filename, imode, attrblock)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/paramiko-1.15.1-py2.6.egg/paramiko/sftp_client.py",
>>> line 729, in _request*
>>>
>>> * return self._read_response(num)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/paramiko-1.15.1-py2.6.egg/paramiko/sftp_client.py",
>>> line 776, in _read_response*
>>>
>>> * self._convert_status(msg)*
>>>
>>> * File
>>> "/usr/lib/python2.6/site-packages/paramiko-1.15.1-py2.6.egg/paramiko/sftp_client.py",
>>> line 802, in _convert_status*
>>>
>>> * raise IOError(errno.ENOENT, text)*
>>>
>>> *IOError: [Errno 2] No such file*
>>>
>>>
>>> *!!! ERROR - Oops! Looks like you've found a bug in StarCluster*
>>>
>>> *!!! ERROR - Crash report written to:
>>> /root/.starcluster/logs/crash-report-11317.txt*
>>>
>>> *!!! ERROR - Please remove any sensitive data from the crash report*
>>>
>>> *!!! ERROR - and submit it to starcluster at mit.edu <starcluster at mit.edu>*
>>>
>>>
>>> There's not much more in the crash report, but I can send it, if it will
>>> help. Thanks in advance.
>>>
>>>
>>> Best,
>>>
>>> Lyn
>>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20150807/13b3c0ee/attachment-0001.htm
More information about the StarCluster
mailing list