[StarCluster] createvolume works / mount fails

Tue Feb 12 22:44:18 EST 2013

Read the code in clustersetup.py and have retried this process w/no tags or
any non-essential data associated w/the latest created volume,
vol-52fa8f23.  Same failure mode as before:

.starcluster mary$ sc start -b 0.25 -i m1.small -I m1.small -c jobscluster
jobscluster
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster at mit.edu

*** WARNING - ************************************************************
*** WARNING - SPOT INSTANCES ARE NOT GUARANTEED TO COME UP
*** WARNING -
*** WARNING - Spot instances can take a long time to come up and may not
*** WARNING - come up at all depending on the current AWS load and your
*** WARNING - max spot bid price.
*** WARNING -
*** WARNING - StarCluster will wait indefinitely until all instances (2)
*** WARNING - come up. If this takes too long, you can cancel the start
*** WARNING - command using CTRL-C. You can then resume the start command
*** WARNING - later on using the --no-create (-x) option:
*** WARNING -
*** WARNING -     $ starcluster start -x jobscluster
*** WARNING -
*** WARNING - This will use the existing spot instances launched
*** WARNING - previously and continue starting the cluster. If you don't
*** WARNING - wish to wait on the cluster any longer after pressing CTRL-C
*** WARNING - simply terminate the cluster using the 'terminate' command.
*** WARNING - ************************************************************

*** WARNING - Waiting 5 seconds before continuing...
*** WARNING - Press CTRL-C to cancel...
5...4...3...2...1...
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 2-node cluster...
>>> Launching master node (ami: ami-4b9f0a22, type: m1.small)...
>>> Creating security group @sc-jobscluster...
Reservation:r-22c1d659
>>> Launching node001 (ami: ami-4b9f0a22, type: m1.small)
SpotInstanceRequest:sir-654c2614
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for open spot requests to become active...
1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for all nodes to be in a 'running' state...
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for SSH to come up on all nodes...
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
>>> Waiting for cluster to come up took 5.990 mins
>>> The master node is ec2-50-16-56-237.compute-1.amazonaws.com
>>> Setting up the cluster...
>>> Attaching volume vol-52fa8f23 to master node on /dev/sdz ...
>>> Configuring hostnames...
2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100%
*** WARNING - Cannot find device /dev/xvdz for volume vol-52fa8f23
*** WARNING - Not mounting vol-52fa8f23 on /usr/share/jobs/
*** WARNING - This usually means there was a problem attaching the EBS
volume to the master node
<snip>

However, starcluster listclusters shows the volume attached to the master:

starcluster mary$ sc listclusters
StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster at mit.edu

---------------------------------------------
jobscluster (security group: @sc-jobscluster)
---------------------------------------------
Launch time: 2013-02-12 18:51:26
Uptime: 0 days, 00:36:27
Zone: us-east-1c
Keypair: lapuserkey
EBS volumes:
    vol-52fa8f23 on master:/dev/sdz (status: attached)
    vol-e6e39697 on master:/dev/sda (status: attached)
    vol-bce99ccd on node001:/dev/sda (status: attached)
Spot requests: 1 active
Cluster nodes:
     master running i-859591f5 ec2-50-16-56-237.compute-1.amazonaws.com
    node001 running i-679d9917
ec2-54-234-176-219.compute-1.amazonaws.com(spot sir-654c2614)
Total nodes: 2

...but on the master itself, neither /dev/sdz nor /dev/xdvz shows up:

[root at master ~]# ls /dev/sd*
/dev/sda  /dev/sda1  /dev/sda2  /dev/sda3  /dev/sdad  /dev/sdb

[root at master ~]# ls /dev/xvd*
/dev/xvdad  /dev/xvde  /dev/xvde1  /dev/xvde2  /dev/xvde3  /dev/xvdf

Thanks again for any suggestions on how to get this volume to successfully
mount on the master.

Lyn

On Tue, Feb 12, 2013 at 1:50 PM, Lyn Gerner <schedulerqueen at gmail.com>wrote:

> Hi All,
>
> I've been receiving an error, consistently, from multiple attempts to boot
> a cluster that references an EBS volume that I've created w/"starcluster
> createvolume":
>
> Here is the output from the most recent createvolume; looks like
> everything goes fine:
>
> .starcluster mary$ alias sc=starcluster
> .starcluster mary$ sc createvolume --name=usrsharejobs-cv5g-use1c 5
> us-east-1c
> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
> Software Tools for Academics and Researchers (STAR)
> Please submit bug reports to starcluster at mit.edu
>
> >>> No keypair specified, picking one from config...
> >>> Using keypair: lapuserkey
> >>> Creating security group @sc-volumecreator...
> >>> No instance in group @sc-volumecreator for zone us-east-1c, launching
> one now.
> Reservation:r-de9f8aa5
> >>> Waiting for volume host to come up... (updating every 30s)
> >>> Waiting for all nodes to be in a 'running' state...
> 1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>> Waiting for SSH to come up on all nodes...
> 1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>> Waiting for cluster to come up took 1.447 mins
> >>> Checking for required remote commands...
> >>> Creating 5GB volume in zone us-east-1c
> >>> New volume id: vol-53600b22
> >>> Waiting for new volume to become 'available'...
> >>> Attaching volume vol-53600b22 to instance i-6b714b1b...
> >>> Formatting volume...
> Filesystem label=
> OS type: Linux
> Block size=4096 (log=2)
> Fragment size=4096 (log=2)
> Stride=0 blocks, Stripe width=0 blocks
> 327680 inodes, 1310720 blocks
> 65536 blocks (5.00%) reserved for the super user
> First data block=0
> Maximum filesystem blocks=1342177280
> 40 block groups
> 32768 blocks per group, 32768 fragments per group
> 8192 inodes per group
> Superblock backups stored on blocks:
> 32768, 98304, 163840, 229376, 294912, 819200, 884736
>
> Writing inode tables: done
> Creating journal (32768 blocks): done
> Writing superblocks and filesystem accounting information: done
>
> This filesystem will be automatically checked every 33 mounts or
> 180 days, whichever comes first.  Use tune2fs -c or -i to override.
> mke2fs 1.41.14 (22-Dec-2010)
>
> >>> Leaving volume vol-53600b22 attached to instance i-6b714b1b
> >>> Not terminating host instance i-6b714b1b
> *** WARNING - There are still volume hosts running: i-6b714b1b
> *** WARNING - Run 'starcluster terminate volumecreator' to terminate *all*
> volume host instances once they're no longer needed
> >>> Your new 5GB volume vol-53600b22 has been created successfully
> >>> Creating volume took 1.871 mins
>
> .starcluster mary$ sc terminate volumecreator
> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
> Software Tools for Academics and Researchers (STAR)
> Please submit bug reports to starcluster at mit.edu
>
> Terminate EBS cluster volumecreator (y/n)? y
> >>> Detaching volume vol-53600b22 from volhost-us-east-1c
> >>> Terminating node: volhost-us-east-1c (i-6b714b1b)
> >>> Waiting for cluster to terminate...
> >>> Removing @sc-volumecreator security group
>
> .starcluster mary$ sc listvolumes
> <snip>
>
> volume_id: vol-53600b22
> size: 5GB
> status: available
> availability_zone: us-east-1c
> create_time: 2013-02-12 13:12:16
> tags: Name=usrsharejobs-cv5g-use1c
>
> <snip>
>
> So here is the subsequent attempt to boot a cluster that tries to mount
> the new EBS volume:
>
> .starcluster mary$ sc start -b 0.25 -i m1.small -I m1.small -c jobscluster
> jobscluster
> StarCluster - (http://web.mit.edu/starcluster) (v. 0.93.3)
> Software Tools for Academics and Researchers (STAR)
> Please submit bug reports to starcluster at mit.edu
>
> *** WARNING - ************************************************************
> *** WARNING - SPOT INSTANCES ARE NOT GUARANTEED TO COME UP
> *** WARNING -
> *** WARNING - Spot instances can take a long time to come up and may not
> *** WARNING - come up at all depending on the current AWS load and your
> *** WARNING - max spot bid price.
> *** WARNING -
> *** WARNING - StarCluster will wait indefinitely until all instances (2)
> *** WARNING - come up. If this takes too long, you can cancel the start
> *** WARNING - command using CTRL-C. You can then resume the start command
> *** WARNING - later on using the --no-create (-x) option:
> *** WARNING -
> *** WARNING -     $ starcluster start -x jobscluster
> *** WARNING -
> *** WARNING - This will use the existing spot instances launched
> *** WARNING - previously and continue starting the cluster. If you don't
> *** WARNING - wish to wait on the cluster any longer after pressing CTRL-C
> *** WARNING - simply terminate the cluster using the 'terminate' command.
> *** WARNING - ************************************************************
>
> *** WARNING - Waiting 5 seconds before continuing...
> *** WARNING - Press CTRL-C to cancel...
> 5...4...3...2...1...
> >>> Validating cluster template settings...
> >>> Cluster template settings are valid
> >>> Starting cluster...
> >>> Launching a 2-node cluster...
> >>> Launching master node (ami: ami-4b9f0a22, type: m1.small)...
> >>> Creating security group @sc-jobscluster...
> Reservation:r-ba8c99c1
> >>> Launching node001 (ami: ami-4b9f0a22, type: m1.small)
> SpotInstanceRequest:sir-a05ae014
> >>> Waiting for cluster to come up... (updating every 30s)
> >>> Waiting for open spot requests to become active...
> 1/1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>> Waiting for all nodes to be in a 'running' state...
> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>> Waiting for SSH to come up on all nodes...
> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> >>> Waiting for cluster to come up took 6.245 mins
> >>> The master node is ec2-54-242-244-139.compute-1.amazonaws.com
> >>> Setting up the cluster...
> >>> Attaching volume vol-53600b22 to master node on /dev/sdz ...
> >>> Configuring hostnames...
> 2/2 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
> 100%
> *** WARNING - Cannot find device /dev/xvdz for volume vol-53600b22
> *** WARNING - Not mounting vol-53600b22 on /usr/share/jobs
> *** WARNING - This usually means there was a problem attaching the EBS
> volume to the master node
> <snip>
>
> So per the relevant, past email threads, I'm using the createvolume
> command, and it still gives this error.  Also tried creating the volume
> thru the AWS console; subsequent cluster boot fails at the same point w/the
> same problem of not finding the device.
>
> I'll appreciate any suggestions.
>
> Thanks much,
> Lyn
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20130212/2b96a254/attachment-0001.htm