[StarCluster] [Star cluster] error tolerance design when adding nodes

Jin Yu yujin2004 at gmail.com
Sun Jul 20 15:08:00 EDT 2014


Hello,

For an example, I just found it is not uncommon to have one or two
instances not communicable after you adding 50 instances in the cluster.
The progress bar got stuck when waiting for ssh. And I have to manually
restart those problematic instances.

I have not yet went through the codes of starcluster, I wonder if
StarCluster already has some error tolerance design for these situation?

Thanks!
Jin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140720/60904015/attachment.htm


More information about the StarCluster mailing list