[StarCluster] trouble with starting a large cluster

Rayson Ho raysonlogin at yahoo.com
Thu Sep 1 17:11:22 EDT 2011


In 0.92rc2, there's the addnode command, which would allow you to start from  a small number of nodes and then grow the cluster.

"Adding and Removing Nodes from StarCluster"

http://web.mit.edu/stardev/cluster/docs/0.92rc2/manual/addremovenode.html

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


--- On Thu, 9/1/11, Rayson Ho <raysonlogin at yahoo.com> wrote:
> > #cli.py:1079 - ERROR - failed to connect to host
> > ec2-50-19-64-123.compute-1.amazonaws.com on port 22
> > 
> > Looking at the AWS console, I could see all 30
> instances
> > were up and running. I even checked a few boot logs
> (e.g.
> > right click on an instance and choose the "Get System
> Log"
> > menu item), which all looked OK to me, granted I
> didn't
> > check all 30 logs...,
> 
> Can you check if "ec2-50-19-64-123" is stuck??
> 
> I believe once in a while, a VM on EC2 fails to startup...
> But rebooting the machine would work-around the issue. (May
> be hardware related or a bug in the EC2 provisioning
> layer.)
> 
> http://mailman.mit.edu/pipermail/starcluster/2011-April/000703.html
> 
> Rayson
> 
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net
> 
> > maybe there is one instance having
> > trouble starting, like the above message suggesting...
> I'm
> > guessing this could be simply a timing-out issue but I
> don't
> > know if/where there's a place I can change this. Dose
> > StarCluster skip any instances that fail to come up?
> > 
> > And I'm using 0.91.2. I was hoping not to have to
> upgrade
> > (yet) as I'm needing results fast and don't want to
> risk
> > breaking something during the upgrade. AWS gave me
> capacity
> > to run 400 instances, so I'm hoping this is an easily
> solved
> > problem and I would be able to use that capacity...
> > 
> > Appreciate any help!
> > 
> > fei
> > 
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
> > 
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
> 



More information about the StarCluster mailing list