[StarCluster] removenode failing

Silverstein herc.silverstein at schrodinger.com
Thu Dec 10 00:59:54 EST 2015


I'm running the loadbalancer on a cluster with 5 compute nodes and a 
master (started with a master and 1 compute node).  It correctly 
detected that it should remove nodes.  It removed the nodes from SGE's 
execute list, but the nodes were still in the cluster (listclusters 
shows them).  I then killed the loadbalancer and tried removing manually 
via "removenode".  This resulted in:

Remove 5 nodes from cluster5(y/n)? y
 >>> Running plugin elasticip.ElasticIPSetup
 >>> Running plugin schrowscoreconfigurator.SchrodingerConfiguratorPlugin
 >>> Running plugin starcluster.plugins.sge.SGEPlugin
 >>> Removing node006 from SGE
!!! ERROR - Error occured while running plugin 
'starcluster.plugins.sge.SGEPlugin':
!!! ERROR - remote command 'source /etc/profile && qconf -de node006'
!!! ERROR - failed with status 1:
!!! ERROR - denied: execution host "node006" does not exist

So I forcibly removed them.  when I do that I get messages like this for 
each node:

 >>> Terminating node: node006 (i-d16e4815)
 >>> Running plugin elasticip.ElasticIPSetup
 >>> Running plugin schrowscoreconfigurator.SchrodingerConfiguratorPlugin
 >>> Running plugin starcluster.plugins.sge.SGEPlugin
 >>> Removing node005 from SGE
!!! ERROR - Error occured while running plugin 
'starcluster.plugins.sge.SGEPlugin':

Has anyone experienced this?  If so, what is causing this?

Herc



More information about the StarCluster mailing list