[StarCluster] loadbalancer not removing nodes

Silverstein herc.silverstein at schrodinger.com
Thu Nov 19 13:06:04 EST 2015


Hi,

I've been using the loadbalancer on a small cluster (up to 5 execute 
nodes + the master).  The nodes are c3.8xlarge.  It seems to spin nodes 
up and configure SGE OK, but upon automatically removing nodes when the 
load goes down it's not working properly.

All of the nodes were removed from SGE as execute nodes.  However, all 
of the nodes were left running.  In addition, if I try to manually do a 
removenode it generates errors.  I then had to forcibly remove the nodes 
with removenode -f.

starcluster --version
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster at mit.edu

0.95.6

The master node is running:
  lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 12.04.2 LTS
Release:        12.04
Codename:       precise

Unfortunately, it looks like my debug logs have been rotated and so I 
don't have a log at the time the problem happened.   Has anyone else run 
into this?  If so, do you know what's causing this?  And how to avoid it?

Thanks,

Herc





More information about the StarCluster mailing list