[StarCluster] ? re CreateUsers error in .95.6
Lyn Gerner
schedulerqueen at gmail.com
Fri Aug 7 19:10:34 EDT 2015
And thanks for the detail about a versus a+. Never had issues before, so
would like not to have to hack the code, but it may come to that.
Thanks again,
Lyn
On Fri, Aug 7, 2015 at 1:08 PM, Lyn Gerner <schedulerqueen at gmail.com> wrote:
> Thanks for the reply, Chris.
>
> (Un-closing this issue; it's back.)
>
> Nope, it was really there, with several lines in it (these are the only
> users of even plausible interest).
>
> *root at AWS-VTMXmaster-w2c **~*
>
> *# wc -l ~/.ssh/known_hosts ~prod/.ssh/known_hosts*
>
> * 24 /root/.ssh/known_hosts*
>
> * 4 /home/prod/.ssh/known_hosts*
>
> * 28 total*
>
>
> *root at AWS-VTMXmaster-w2c **~*
>
> *# > /home/prod/.ssh/known_hosts*
>
>
> *root at AWS-VTMXmaster-w2c **~*
>
> *# > /root/.ssh/known_hosts*
>
>
> *Sadly:* After launching a new cluster and successfully adding a node001,
> now my attempt to add a second node is again experiencing this failure.
>
> Thanks for any more ideas.
>
> Best,
> Lyn
>
>
> On Fri, Aug 7, 2015 at 12:51 PM, Christopher Clearfield <
> chris.clearfield at system-logic.com> wrote:
>
>> I wonder if, instead of zeroing-out the file, '> known_hosts' actually
>> created it?
>>
>> I noticed the mode for opening the file originally is:
>> *add_to_known_hosts*
>>
>> * khostsf = self.ssh.remote_file(known_hosts_file, 'a')*
>> 'a', rather than 'a+', so it will fail if the file doesn't exist for some
>> reasons.
>>
>> –
>> C
>>
>>
>> On Fri, Aug 7, 2015 at 3:47 PM Lyn Gerner <schedulerqueen at gmail.com>
>> wrote:
>>
>>> Update/Close: Strangely, this particular issue was resolved by going to
>>> the master and zeroing the known_hosts file (as in "> known_hosts").
>>>
>>> On Fri, Aug 7, 2015 at 11:39 AM, Lyn Gerner <schedulerqueen at gmail.com>
>>> wrote:
>>>
>>>> Hi Developers,
>>>>
>>>> Sorry for the Fri afternoon query, but I'm getting an error never
>>>> before seen on an addnode, and it recurs even on a -x retry. Appreciate
>>>> any workaround/recovery suggestions for the following:
>>>>
>>>> *# sc an -x -a node002 w2c*
>>>>
>>>> *StarCluster - (http://star.mit.edu/cluster
>>>> <http://star.mit.edu/cluster>) (v. 0.95.6)*
>>>>
>>>> *Software Tools for Academics and Researchers (STAR)*
>>>>
>>>> *Please submit bug reports to starcluster at mit.edu <starcluster at mit.edu>*
>>>>
>>>>
>>>> *>>> Waiting for node(s) to come up... (updating every 30s)*
>>>>
>>>> *>>> Waiting for all nodes to be in a 'running' state...*
>>>>
>>>> *3/3
>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>>
>>>> *>>> Waiting for SSH to come up on all nodes...*
>>>>
>>>> *3/3
>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>>
>>>> *>>> Waiting for cluster to come up took 0.206 mins*
>>>>
>>>> *>>> Running plugin starcluster.clustersetup.DefaultClusterSetup*
>>>>
>>>> *>>> Configuring hostnames...*
>>>>
>>>> *1/1
>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>>
>>>> *>>> Configuring /etc/hosts on each node*
>>>>
>>>> *3/3
>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>>
>>>> *>>> Configuring NFS exports path(s):*
>>>>
>>>> */home /jobs/ /usr/share/jobs/ /pipe/*
>>>>
>>>> *>>> Mounting all NFS export path(s) on 1 worker node(s)*
>>>>
>>>> *1/1
>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>>
>>>> *>>> Setting up NFS took 0.021 mins*
>>>>
>>>> *1/1
>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>>
>>>> *>>> Configuring scratch space for user(s): sgeadmin*
>>>>
>>>> *1/1
>>>> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% *
>>>>
>>>> *>>> Configuring passwordless ssh for root*
>>>>
>>>> *>>> Configuring passwordless ssh for sgeadmin*
>>>>
>>>> *>>> Running plugin swap_addnode_w2c.VISwapConfigurator*
>>>>
>>>> *>>> Configuring Swap on node002*
>>>>
>>>> *>>> Running plugin starcluster.plugins.users.CreateUsers*
>>>>
>>>> *>>> Creating 1 users on node002*
>>>>
>>>> *>>> Adding node002 to known_hosts for 1 users*
>>>>
>>>> *!!! ERROR - Error occured while running plugin
>>>> 'starcluster.plugins.users.CreateUsers':*
>>>>
>>>> *!!! ERROR - Unhandled exception occured*
>>>>
>>>> *Traceback (most recent call last):*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cli.py",
>>>> line 274, in main*
>>>>
>>>> * sc.execute(args)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/commands/addnode.py",
>>>> line 128, in execute*
>>>>
>>>> * no_create=self.opts.no_create)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cluster.py",
>>>> line 189, in add_nodes*
>>>>
>>>> * no_create=no_create)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cluster.py",
>>>> line 1042, in add_nodes*
>>>>
>>>> * self.run_plugins(method_name="on_add_node", node=node)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cluster.py",
>>>> line 1690, in run_plugins*
>>>>
>>>> * self.run_plugin(plug, method_name=method_name, node=node)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/cluster.py",
>>>> line 1715, in run_plugin*
>>>>
>>>> * func(*args)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/plugins/users.py",
>>>> line 164, in on_add_node*
>>>>
>>>> * master.add_to_known_hosts(user, [node])*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/node.py",
>>>> line 578, in add_to_known_hosts*
>>>>
>>>> * khostsf = self.ssh.remote_file(known_hosts_file, 'a')*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/StarCluster-0.95.6-py2.6.egg/starcluster/sshutils.py",
>>>> line 320, in remote_file*
>>>>
>>>> * rfile = self.sftp.open(file, mode)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/paramiko-1.15.1-py2.6.egg/paramiko/sftp_client.py",
>>>> line 327, in open*
>>>>
>>>> * t, msg = self._request(CMD_OPEN, filename, imode, attrblock)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/paramiko-1.15.1-py2.6.egg/paramiko/sftp_client.py",
>>>> line 729, in _request*
>>>>
>>>> * return self._read_response(num)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/paramiko-1.15.1-py2.6.egg/paramiko/sftp_client.py",
>>>> line 776, in _read_response*
>>>>
>>>> * self._convert_status(msg)*
>>>>
>>>> * File
>>>> "/usr/lib/python2.6/site-packages/paramiko-1.15.1-py2.6.egg/paramiko/sftp_client.py",
>>>> line 802, in _convert_status*
>>>>
>>>> * raise IOError(errno.ENOENT, text)*
>>>>
>>>> *IOError: [Errno 2] No such file*
>>>>
>>>>
>>>> *!!! ERROR - Oops! Looks like you've found a bug in StarCluster*
>>>>
>>>> *!!! ERROR - Crash report written to:
>>>> /root/.starcluster/logs/crash-report-11317.txt*
>>>>
>>>> *!!! ERROR - Please remove any sensitive data from the crash report*
>>>>
>>>> *!!! ERROR - and submit it to starcluster at mit.edu <starcluster at mit.edu>*
>>>>
>>>>
>>>> There's not much more in the crash report, but I can send it, if it
>>>> will help. Thanks in advance.
>>>>
>>>>
>>>> Best,
>>>>
>>>> Lyn
>>>>
>>>
>>> _______________________________________________
>>> StarCluster mailing list
>>> StarCluster at mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20150807/df5302e6/attachment-0001.htm
More information about the StarCluster
mailing list