<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif">Hello,<br><br>I encountered continuing errors when trying to remove nodes using loadbalancer. <br><br>From error messages (which is appended below), I got an error regarding the user object.I am just using the default user "sgeadmin". <br>
<br>I log in the "tried to remove" node and can verify that following steps have been done:<br><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">1. the node has been removed from SGE<br>
</div><div class="gmail_default" style="font-family:verdana,sans-serif">2. NFS has been unmounted<br></div><div class="gmail_default" style="font-family:verdana,sans-serif">3. sgeadmin user has been deleted <br></div><div class="gmail_default" style="font-family:verdana,sans-serif">
4. the hosts file has no ip of any other nodes or masters instance<br><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">But this node is not terminated and still show up when I "starcluster lc". <br>
</div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">
Thanks!<br>Jin<br></div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif"><br></div><div class="gmail_default" style="font-family:verdana,sans-serif">
<br>>>> Running plugin starcluster.plugins.sge.SGEPlugin<br>>>> Removing node037 from SGE<br>>>> Updating SGE parallel environment 'orte'<br>50/50 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Adding parallel environment 'orte' to queue 'all.q'<br>>>> Running plugin starcluster.clustersetup.DefaultClusterSetup<br>>>> Removing node node037 (i-1f013c34)...<br>>>> Removing node037 from known_hosts files<br>
!!! ERROR - Error occured while running plugin 'starcluster.clustersetup.DefaultClusterSetup':<br>!!! ERROR - Failed to remove node node037<br>Traceback (most recent call last):<br> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/balancers/sge/__init__.py", line 754, in _eval_remove_node<br>
self._cluster.remove_node(node)<br> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py", line 1050, in remove_node<br> force=force)<br> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py", line 1076, in remove_nodes<br>
reverse=True)<br> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py", line 1690, in run_plugins<br> self.run_plugin(plug, method_name=method_name, node=node)<br>
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/cluster.py", line 1715, in run_plugin<br> func(*args)<br> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/clustersetup.py", line 407, in on_remove_node<br>
self._remove_from_known_hosts(node)<br> File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/clustersetup.py", line 397, in _remove_from_known_hosts<br> n.remove_from_known_hosts(self._user, [node])<br>
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.95.5-py2.7.egg/starcluster/node.py", line 588, in remove_from_known_hosts<br> known_hosts_file = posixpath.join(user.pw_dir, '.ssh', 'known_hosts')<br>
AttributeError: 'NoneType' object has no attribute 'pw_dir'<br>>>> Sleeping...(looping again in 60 secs)<br><br></div></div>