[StarCluster] [Star Cluster] NoneType user errors when removing nodes

Jin Yu yujin2004 at gmail.com
Sun Jul 20 16:08:22 EDT 2014


Hello,

I encountered continuing errors when trying to remove nodes using
loadbalancer.

>From error messages (which is appended below), I got an error regarding the
user object.I am just using the default user "sgeadmin".

I log in the "tried to remove" node and can verify that following steps
have been done:

1. the node has been removed from SGE
2. NFS has been unmounted
3. sgeadmin user has been deleted
4. the hosts file has no ip of any other nodes or masters instance

But this node is not terminated and still show up when I "starcluster lc".


Thanks!
Jin



>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node037 from SGE
>>> Updating SGE parallel environment 'orte'
50/50 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Removing node node037 (i-1f013c34)...
>>> Removing node037 from known_hosts files
!!! ERROR - Error occured while running plugin
'starcluster.clustersetup.DefaultClusterSetup':
!!! ERROR - Failed to remove node node037
Traceback (most recent call last):
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/balancers/sge/__init__.py",
line 754, in _eval_remove_node
    self._cluster.remove_node(node)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1050, in remove_node
    force=force)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1076, in remove_nodes
    reverse=True)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1690, in run_plugins
    self.run_plugin(plug, method_name=method_name, node=node)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py",
line 1715, in run_plugin
    func(*args)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/clustersetup.py",
line 407, in on_remove_node
    self._remove_from_known_hosts(node)
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/clustersetup.py",
line 397, in _remove_from_known_hosts
    n.remove_from_known_hosts(self._user, [node])
  File
"/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/node.py",
line 588, in remove_from_known_hosts
    known_hosts_file = posixpath.join(user.pw_dir, '.ssh', 'known_hosts')
AttributeError: 'NoneType' object has no attribute 'pw_dir'
>>> Sleeping...(looping again in 60 secs)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140720/efc587f6/attachment.htm


More information about the StarCluster mailing list