[StarCluster] failed to add/remove additional nodes
Ryan Golhar
ngsbioinformatics at gmail.com
Thu Apr 3 14:32:45 EDT 2014
Hi all - I have a 50 node spot cluster running. I tried to add 10
additional nodes and at some point along the way it failed. Only 2 nodes
were added to the cluster, but they aren't getting SGE jobs. I tried
re-adding the nodes using '-x -a' but it fails. So I then tried to remove
the nodes, and that is failing as well. How do I fix this? Here's the
output:
[ec2-user at awsmicro plugins]$ starcluster removenode ngscluster node060
StarCluster - (http://star.mit.edu/cluster) (v. 0.9999)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster at mit.edu
>>> Running plugin tagger.TaggerPlugin
>>> Running plugin setupuserenv.SetupUserEnvironment
>>> Running plugin starcluster.plugins.users.CreateUsers
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Removing node060 from SGE
!!! ERROR - Error occured while running plugin
'starcluster.plugins.sge.SGEPlugin':
!!! ERROR - remote command 'source /etc/profile && qconf -dattr
!!! ERROR - hostgroup hostlist node060 @allhosts' failed with status 1:
!!! ERROR - error writing object "@allhosts" to spooling database
At this point, I have to go into the AWS web console and remove the nodes
myself as starcluster isn't able to.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140403/23eae6a6/attachment.htm
More information about the StarCluster
mailing list