[StarCluster] removenode failing
Silverstein
herc.silverstein at schrodinger.com
Thu Dec 10 00:59:54 EST 2015
I'm running the loadbalancer on a cluster with 5 compute nodes and a
master (started with a master and 1 compute node). It correctly
detected that it should remove nodes. It removed the nodes from SGE's
execute list, but the nodes were still in the cluster (listclusters
shows them). I then killed the loadbalancer and tried removing manually
via "removenode". This resulted in:
Remove 5 nodes from cluster5(y/n)? y
>>> Running plugin elasticip.ElasticIPSetup
>>> Running plugin schrowscoreconfigurator.SchrodingerConfiguratorPlugin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node006 from SGE
!!! ERROR - Error occured while running plugin
'starcluster.plugins.sge.SGEPlugin':
!!! ERROR - remote command 'source /etc/profile && qconf -de node006'
!!! ERROR - failed with status 1:
!!! ERROR - denied: execution host "node006" does not exist
So I forcibly removed them. when I do that I get messages like this for
each node:
>>> Terminating node: node006 (i-d16e4815)
>>> Running plugin elasticip.ElasticIPSetup
>>> Running plugin schrowscoreconfigurator.SchrodingerConfiguratorPlugin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node005 from SGE
!!! ERROR - Error occured while running plugin
'starcluster.plugins.sge.SGEPlugin':
Has anyone experienced this? If so, what is causing this?
Herc
More information about the StarCluster
mailing list