[StarCluster] SSH doesn't come up on c3.8xlarge instances

Wed Apr 23 01:26:33 EDT 2014

Hello Lyn and Niklas,

I start the cluster with the following command:

  starcluster start -c c3_8xlarge cluster

The template is based on:

[cluster defcluster]
KEYNAME = thekey
CLUSTER_USER = sgeadmin
CLUSTER_SHELL = bash
DISABLE_QUEUE = True
# ami-6b211202 us-east-1 starcluster-base-ubuntu-13.04-x86_64-hvm
NODE_IMAGE_ID = ami-6b211202
NODE_INSTANCE_TYPE = m3.medium
CLUSTER_SIZE = 1
VOLUMES = latvol

[cluster c3_8xlarge]
EXTENDS = defcluster
NODE_INSTANCE_TYPE = c3.8xlarge

I also tried restarting the instance and interestingly it doesn't work.
it just doesn't reboot. I then stopped and started the instance again,
but nothing changed. I tried it both yesterday and today, so I don't
think its a temporary issue.

Thanks!
Christian

On 23/04/2014 13:01, Lyn Gerner wrote:
> Hi Christian,
>
> Could you please advise, what is the exact command you're using to
> launch this cluster?
>
> Thanks, Lyn
>
>
> On Tue, Apr 22, 2014 at 5:30 PM, Christian Zielinski
> <email at czielinski.de <mailto:email at czielinski.de>> wrote:
>
> Dear all,
>
> StarCluster proved to be an extremely useful tool in my research
> (thanks for everybody being involved!), but currently I'm
> experiencing some strange behavior. I run all my instances in the
> us-east-1 region with the AMI:
>
> ami-6b211202 us-east-1 starcluster-base-ubuntu-13.04-x86_64-hvm
> (HVM-EBS)
>
> That works very well for c3.{large, xlarge, 2xlarge, 4xlarge}
> instances. However, when I start a c3.8xlarge instance with the same
> AMI SSH doesn't get up.
>
> Specifically, after starting a c3.8xlarge instance StarCluster is
> hanging at this point:
>
>>>> Waiting for SSH to come up on all nodes...
> 0/1 |                                           |   0%
>
> In the AWS web console I can find the following message under
> "Instance Status Checks" (the system status check is fine):
>
> "Instance reachability check failed at April 23, 2014 11:08:00 AM
> UTC+8 (7 minutes ago)"
>
> What really confuses me is that everything works fine on all the
> other instance types and only c3.8xlarge seems to be affected. Could
> it be potentially some compatibility issue of the AMI? Interestingly
> with the AMI
>
> ami-52a0c53b us-east-1 starcluster-base-ubuntu-12.04-x86_64-hvm
> (HVM-EBS)
>
> everything works fine. However, I would like to use the newer AMI
> (among the reasons is enhanced networking and being a HVM image).
> Does anybody experience similar problems?
>
> Thank you and regards Christian Zielinski
>
>
> P.S.: The system logs are ending with the following messages:
>
> [...] [    0.802898]  #30cpu 31 spinlock event irq 281 [    0.818286]
> installing Xen timer for CPU 31 [    0.820037]  #31 [    0.835184]
> Brought up 32 CPUs [    0.836006] smpboot: Total of 32 processors
> activated (180284.38 BogoMIPS)
>
>
> -- http://www.czielinski.de/
> _______________________________________________ StarCluster mailing
> list StarCluster at mit.edu <mailto:StarCluster at mit.edu>
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>

-- 
http://www.czielinski.de/