<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<br>
-----BEGIN PGP SIGNED MESSAGE-----<br>
Hash: SHA1<br>
<br>
Hi Hugh,<br>
<br>
After many tries I was able to reproduce this and can confirm it's a
transient issue related to polling for spot requests too quickly.<br>
<br>
I'm working on a patch now but in the mean time if this happens
simply CTRL-C the 'start' command and then run the same start
command again with the -x option. The second run should work as
expected given that some time will go by and the spot instance
requests will be available.<br>
<br>
I've created an issue to keep track of this:<br>
<br>
<a class="moz-txt-link-freetext" href="http://web.mit.edu/star/cluster/issues/105">http://web.mit.edu/star/cluster/issues/105</a><br>
<br>
~Justin<br>
<br>
On 4/12/12 1:30 PM, MacMullan, Hugh wrote:<br>
<span style="white-space: pre;">><br>
> Folks:<br>
><br>
> <br>
><br>
> First: any good way to search the archives? I tried various
google strings to no good effect. I hate to duplicate
effort/messages ?<br>
><br>
> <br>
><br>
> More importantly: A possible bug? Sometimes when starting
SPOT_BID clusters (~30% of the time?) I'm seeing ?start? skip
(apparently) ?Waiting for open spot requests to become active??
and just process the master. When it works correctly, I see:<br>
><br>
> <br>
><br>
> >>> Launching node001 (ami: ami-12b6477b, type:
cc1.4xlarge)<br>
><br>
> SpotInstanceRequest:sir-9f38a214<br>
><br>
> >>> Launching node002 (ami: ami-12b6477b, type:
cc1.4xlarge)<br>
><br>
> SpotInstanceRequest:sir-c4505a11<br>
><br>
> >>> Launching node003 (ami: ami-12b6477b, type:
cc1.4xlarge)<br>
><br>
> SpotInstanceRequest:sir-cbb32414<br>
><br>
> >>> Waiting for cluster to come up... (updating
every 20s)<br>
><br>
> >>> Waiting for open spot requests to become
active...<br>
><br>
> 0/3 | | 0% <br>
><br>
> <br>
><br>
> When it doesn?t work correctly, I see the following, where it
skips the highlighted section above and goes straight to ?Waiting
for all nodes?, and the count is /1 instead of /4 (or whatever the
CLUSTER_SIZE is).<br>
><br>
> <br>
><br>
> # starcluster start -c spottest spottest<br>
><br>
> StarCluster - (<a class="moz-txt-link-freetext" href="http://web.mit.edu/starcluster">http://web.mit.edu/starcluster</a>) (v. 0.93.3)<br>
><br>
> Software Tools for Academics and Researchers (STAR)<br>
><br>
> Please submit bug reports to <a class="moz-txt-link-abbreviated" href="mailto:starcluster@mit.edu">starcluster@mit.edu</a><br>
><br>
> <br>
><br>
> >>> Validating cluster template settings...<br>
><br>
> >>> Cluster template settings are valid<br>
><br>
> >>> Starting cluster...<br>
><br>
> >>> Launching a 4-node cluster...<br>
><br>
> >>> Launching master node (ami: ami-12b6477b, type:
cc1.4xlarge)...<br>
><br>
> >>> Creating security group @sc-spottest...<br>
><br>
> >>> Opening tcp port range 22-22 for CIDR
XXXXXXXXXX/22<br>
><br>
> >>> Creating placement group @sc-spottest...<br>
><br>
> Reservation:r-02fbac61<br>
><br>
> >>> Launching node001 (ami: ami-12b6477b, type:
cc1.4xlarge)<br>
><br>
> SpotInstanceRequest:sir-6cb0f014<br>
><br>
> >>> Launching node002 (ami: ami-12b6477b, type:
cc1.4xlarge)<br>
><br>
> SpotInstanceRequest:sir-b0ff9e11<br>
><br>
> >>> Launching node003 (ami: ami-12b6477b, type:
cc1.4xlarge)<br>
><br>
> SpotInstanceRequest:sir-2ef6f814<br>
><br>
> >>> Waiting for cluster to come up... (updating
every 20s)<br>
><br>
> >>> Waiting for all nodes to be in a 'running'
state...<br>
><br>
> 1/1
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100% <br>
><br>
> >>> Waiting for SSH to come up on all nodes...<br>
><br>
> 1/1
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100% <br>
><br>
> >>> Waiting for cluster to come up took 3.547 mins<br>
><br>
> >>> The master node is
ec2-184-72-156-11.compute-1.amazonaws.com<br>
><br>
> <br>
><br>
> I haven?t tried this with anything but ?bigger? stuff (cc1
& cc2), so don?t know if that has any bearing on the
situation. My config:<br>
><br>
> <br>
><br>
> [global]<br>
><br>
> DEFAULT_TEMPLATE=Rcluster<br>
><br>
> ENABLE_EXPERIMENTAL=True<br>
><br>
> REFRESH_INTERVAL=20<br>
><br>
> <br>
><br>
> [aws info]<br>
><br>
> AWS_ACCESS_KEY_ID = XXXXXXXXXXXX<br>
><br>
> AWS_SECRET_ACCESS_KEY = XXXXXXXXXXXXX<br>
><br>
> AWS_USER_ID = XXXXXXXXXXX<br>
><br>
> EC2_CERT = XXXXXXXXXXX.pem<br>
><br>
> EC2_PRIVATE_KEY = XXXXXXXXXXXXX.pem<br>
><br>
> <br>
><br>
> [key mykey]<br>
><br>
> KEY_LOCATION=XXXXXXXXXXXXXXX.pem<br>
><br>
> <br>
><br>
> [cluster spottest]<br>
><br>
> KEYNAME = mykey<br>
><br>
> CLUSTER_SIZE = 4<br>
><br>
> CLUSTER_USER = sgeadmin<br>
><br>
> CLUSTER_SHELL = bash<br>
><br>
> NODE_IMAGE_ID = ami-12b6477b<br>
><br>
> NODE_INSTANCE_TYPE = cc1.4xlarge<br>
><br>
> AVAILABILITY_ZONE = us-east-1c<br>
><br>
> VOLUMES = Rlocal-spottest<br>
><br>
> PLUGINS = setup-centos<br>
><br>
> PERMISSIONS = ssh-local<br>
><br>
> SPOT_BID = 1.50<br>
><br>
> <br>
><br>
> [volume Rlocal-spottest]<br>
><br>
> VOLUME_ID = vol-XXXXXXXXXX<br>
><br>
> MOUNT_PATH = /usr/local<br>
><br>
> <br>
><br>
> [plugin setup-centos]<br>
><br>
> setup_class = setup-centos.PackageInstaller<br>
><br>
> pkg_to_install = R<br>
><br>
> <br>
><br>
> [permission ssh-local]<br>
><br>
> protocol = tcp<br>
><br>
> from_port = 22<br>
><br>
> to_port = 22<br>
><br>
> cidr_ip = XXXXXXXXXXX/22<br>
><br>
> <br>
><br>
> This exact config works sometimes, other times not. Thanks
for listening, or any advice you might have.<br>
><br>
> -Hugh<br>
><br>
><br>
><br>
> _______________________________________________<br>
> StarCluster mailing list<br>
> <a class="moz-txt-link-abbreviated" href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
> <a class="moz-txt-link-freetext" href="http://mailman.mit.edu/mailman/listinfo/starcluster">http://mailman.mit.edu/mailman/listinfo/starcluster</a></span><br>
<br>
-----BEGIN PGP SIGNATURE-----<br>
Version: GnuPG v1.4.11 (Darwin)<br>
Comment: Using GnuPG with Mozilla - <a class="moz-txt-link-freetext" href="http://enigmail.mozdev.org/">http://enigmail.mozdev.org/</a><br>
<br>
iEYEARECAAYFAk+RbccACgkQ4llAkMfDcrnEYgCeKmUcGy8spO9I2sgHOVfQeE03<br>
pS0AniRXrGY3ObOXZ26R6emB2fs5B5eg<br>
=QRb4<br>
-----END PGP SIGNATURE-----<br>
<br>
</body>
</html>