[StarCluster] force starcluster run

Justin Riley jtriley at MIT.EDU
Sun Dec 5 20:58:06 EST 2010


Hi Adam,

> StarCluster rocks! Great job Justin et al.

Thanks a lot, glad you like it :D

> I was using starcluster (v. 0.9999) to start an 80 node spot instance cluster recently and run into an issue.
>
> starcluster start -b 0.10 -s 80 SpotCluster

Wow, OK, I haven't tried with that many nodes before but it should work. 
Please be patient with the setup, I'd imagine this will take some time 
given the size of the cluster.

> It took a few minutes for the spots to open and the instances to be running.  StarCluster was still waiting on instances to come up so I ran the start command with --no-create
>
> starcluster start --no-create -s 80 SpotCluster
> starcluster start --no-create SpotCluster
>
> I can verify with the AWS console and the output 'starcluster listclusters' that all 80 instances are up and running.  Is there a way to force starcluster to run the install?  Is starcluster checking something other than ec2-describe-instances like ssh to see if a node is up?
>
> Not sure if this is due to the cluster size, spot instances, or just an anomaly like one node not starting sshd.

StarCluster checks that there are CLUSTER_SIZE nodes in a 'running' 
state and whether ssh is up on all the 'running' nodes in the cluster 
when it is 'Waiting for cluster to start'. This is the reason why 
StarCluster is still waiting even though you see all instances 'running' 
in ec2-describe-instances; ssh is likely not up yet for *all* instances 
even though they're all in a 'running' state. There really can't be a 
'force install' because StarCluster has to be able to connect to all 
nodes in the cluster via ssh before it can do anything with the instances.

With that said I'd also expect this process of checking ssh on all the 
nodes to take some time so if you're not patient you may not end up 
giving StarCluster enough time to make connections to all 80 nodes. How 
long did you wait for StarCluster before canceling the run?

Also, you mentioned you're using version 0.9999. When did you last 
pull/install the changes from github?

> I'll try again today I just wanted to see if I could buy a clue from the list.

OK great. If you don't mind, please report whether you're successful or 
not. I'm very interested to know...

~Justin



More information about the StarCluster mailing list