[StarCluster] Error removing node from SGE

David Erickson derickso at stanford.edu
Sat Feb 2 13:56:32 EST 2013


Hi All-
I am seeing the following in my logs when running loadbalancer and 
removing nodes:

*** WARNING - Removing node013: i-dc5dd8ac 
(ec2-54-234-124-79.compute-1.amazonaws.com)
 >>> Running plugin dnrc-cplex
 >>> Removing node node013 (i-dc5dd8ac)...
 >>> Removing node013 from known_hosts files
 >>> Removing node013 from /etc/hosts
 >>> Removing node013 from NFS
 >>> Removing node013 from SGE
!!! ERROR - command 'source /etc/profile && qconf -dconf node013' failed 
with status 1
 >>> Updating SGE parallel environment 'orte'
19/19 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 
100%
 >>> Adding parallel environment 'orte' to queue 'all.q'
 >>> Removing node node013 (i-dc5dd8ac)...
 >>> Removing node013 from known_hosts files
 >>> Removing node013 from /etc/hosts
 >>> Removing node013 from NFS
 >>> Canceling spot request sir-d69dda14
 >>> Terminating node: node013 (i-dc5dd8ac)

It eventually removes the node, but that qconf -dconf command is always 
failing with status 1.

Thanks,
David


More information about the StarCluster mailing list