[StarCluster] loadbalancer not removing nodes
Silverstein
herc.silverstein at schrodinger.com
Thu Nov 19 13:06:04 EST 2015
Hi,
I've been using the loadbalancer on a small cluster (up to 5 execute
nodes + the master). The nodes are c3.8xlarge. It seems to spin nodes
up and configure SGE OK, but upon automatically removing nodes when the
load goes down it's not working properly.
All of the nodes were removed from SGE as execute nodes. However, all
of the nodes were left running. In addition, if I try to manually do a
removenode it generates errors. I then had to forcibly remove the nodes
with removenode -f.
starcluster --version
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster at mit.edu
0.95.6
The master node is running:
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise
Unfortunately, it looks like my debug logs have been rotated and so I
don't have a log at the time the problem happened. Has anyone else run
into this? If so, do you know what's causing this? And how to avoid it?
Thanks,
Herc
More information about the StarCluster
mailing list