[StarCluster] Error removing node from SGE
    David Erickson 
    derickso at stanford.edu
       
    Sat Feb  2 13:56:32 EST 2013
    
    
  
Hi All-
I am seeing the following in my logs when running loadbalancer and 
removing nodes:
*** WARNING - Removing node013: i-dc5dd8ac 
(ec2-54-234-124-79.compute-1.amazonaws.com)
 >>> Running plugin dnrc-cplex
 >>> Removing node node013 (i-dc5dd8ac)...
 >>> Removing node013 from known_hosts files
 >>> Removing node013 from /etc/hosts
 >>> Removing node013 from NFS
 >>> Removing node013 from SGE
!!! ERROR - command 'source /etc/profile && qconf -dconf node013' failed 
with status 1
 >>> Updating SGE parallel environment 'orte'
19/19 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 
100%
 >>> Adding parallel environment 'orte' to queue 'all.q'
 >>> Removing node node013 (i-dc5dd8ac)...
 >>> Removing node013 from known_hosts files
 >>> Removing node013 from /etc/hosts
 >>> Removing node013 from NFS
 >>> Canceling spot request sir-d69dda14
 >>> Terminating node: node013 (i-dc5dd8ac)
It eventually removes the node, but that qconf -dconf command is always 
failing with status 1.
Thanks,
David
    
    
More information about the StarCluster
mailing list