Hi All,<div><br></div><div>I've been receiving an error, consistently, from multiple attempts to boot a cluster that references an EBS volume that I've created w/"starcluster createvolume":</div><div><br>
</div><div>Here is the output from the most recent createvolume; looks like everything goes fine:</div><div><br></div><div>.starcluster mary$ alias sc=starcluster</div><div><div>.starcluster mary$ sc createvolume --name=usrsharejobs-cv5g-use1c 5 us-east-1c</div>
<div>StarCluster - (<a href="http://web.mit.edu/starcluster">http://web.mit.edu/starcluster</a>) (v. 0.93.3)</div><div>Software Tools for Academics and Researchers (STAR)</div><div>Please submit bug reports to <a href="mailto:starcluster@mit.edu">starcluster@mit.edu</a></div>
<div><br></div><div>>>> No keypair specified, picking one from config...</div><div>>>> Using keypair: lapuserkey</div><div>>>> Creating security group @sc-volumecreator...</div><div>>>> No instance in group @sc-volumecreator for zone us-east-1c, launching one now.</div>
<div>Reservation:r-de9f8aa5</div><div>>>> Waiting for volume host to come up... (updating every 30s)</div><div>>>> Waiting for all nodes to be in a 'running' state...</div><div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div>
<div>>>> Waiting for SSH to come up on all nodes...</div><div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div><div>>>> Waiting for cluster to come up took 1.447 mins</div>
<div>>>> Checking for required remote commands...</div><div>>>> Creating 5GB volume in zone us-east-1c</div><div>>>> New volume id: vol-53600b22</div><div>>>> Waiting for new volume to become 'available'... </div>
<div>>>> Attaching volume vol-53600b22 to instance i-6b714b1b... </div><div>>>> Formatting volume...</div><div>Filesystem label=</div><div>OS type: Linux</div><div>Block size=4096 (log=2)</div><div>Fragment size=4096 (log=2)</div>
<div>Stride=0 blocks, Stripe width=0 blocks</div><div>327680 inodes, 1310720 blocks</div><div>65536 blocks (5.00%) reserved for the super user</div><div>First data block=0</div><div>Maximum filesystem blocks=1342177280</div>
<div>40 block groups</div><div>32768 blocks per group, 32768 fragments per group</div><div>8192 inodes per group</div><div>Superblock backups stored on blocks: </div><div><span class="Apple-tab-span" style="white-space:pre">        </span>32768, 98304, 163840, 229376, 294912, 819200, 884736</div>
<div><br></div><div>Writing inode tables: done </div><div>Creating journal (32768 blocks): done</div><div>Writing superblocks and filesystem accounting information: done</div><div><br></div><div>
This filesystem will be automatically checked every 33 mounts or</div><div>180 days, whichever comes first. Use tune2fs -c or -i to override.</div><div>mke2fs 1.41.14 (22-Dec-2010)</div><div><br></div><div>>>> Leaving volume vol-53600b22 attached to instance i-6b714b1b</div>
<div>>>> Not terminating host instance i-6b714b1b</div><div>*** WARNING - There are still volume hosts running: i-6b714b1b</div><div>*** WARNING - Run 'starcluster terminate volumecreator' to terminate *all* volume host instances once they're no longer needed</div>
<div>>>> Your new 5GB volume vol-53600b22 has been created successfully</div><div>>>> Creating volume took 1.871 mins</div><div><br></div><div><div>.starcluster mary$ sc terminate volumecreator</div><div>
StarCluster - (<a href="http://web.mit.edu/starcluster">http://web.mit.edu/starcluster</a>) (v. 0.93.3)</div><div>Software Tools for Academics and Researchers (STAR)</div><div>Please submit bug reports to <a href="mailto:starcluster@mit.edu">starcluster@mit.edu</a></div>
<div><br></div><div>Terminate EBS cluster volumecreator (y/n)? y</div><div>>>> Detaching volume vol-53600b22 from volhost-us-east-1c</div><div>>>> Terminating node: volhost-us-east-1c (i-6b714b1b)</div><div>
>>> Waiting for cluster to terminate... </div><div>>>> Removing @sc-volumecreator security group</div></div><div><br></div><div>.starcluster mary$ sc listvolumes</div></div><div><snip></div><div><div>
<br></div><div>volume_id: vol-53600b22</div><div>size: 5GB</div><div>status: available</div><div>availability_zone: us-east-1c</div><div>create_time: 2013-02-12 13:12:16</div><div>tags: Name=usrsharejobs-cv5g-use1c</div></div>
<div><br></div><div><snip></div><div><br></div><div>So here is the subsequent attempt to boot a cluster that tries to mount the new EBS volume: </div><div><br></div><div><div>.starcluster mary$ sc start -b 0.25 -i m1.small -I m1.small -c jobscluster jobscluster</div>
<div>StarCluster - (<a href="http://web.mit.edu/starcluster">http://web.mit.edu/starcluster</a>) (v. 0.93.3)</div><div>Software Tools for Academics and Researchers (STAR)</div><div>Please submit bug reports to <a href="mailto:starcluster@mit.edu">starcluster@mit.edu</a></div>
<div><br></div><div>*** WARNING - ************************************************************</div><div>*** WARNING - SPOT INSTANCES ARE NOT GUARANTEED TO COME UP</div><div>*** WARNING - </div><div>*** WARNING - Spot instances can take a long time to come up and may not</div>
<div>*** WARNING - come up at all depending on the current AWS load and your</div><div>*** WARNING - max spot bid price.</div><div>*** WARNING - </div><div>*** WARNING - StarCluster will wait indefinitely until all instances (2)</div>
<div>*** WARNING - come up. If this takes too long, you can cancel the start</div><div>*** WARNING - command using CTRL-C. You can then resume the start command</div><div>*** WARNING - later on using the --no-create (-x) option:</div>
<div>*** WARNING - </div><div>*** WARNING - $ starcluster start -x jobscluster</div><div>*** WARNING - </div><div>*** WARNING - This will use the existing spot instances launched</div><div>*** WARNING - previously and continue starting the cluster. If you don't</div>
<div>*** WARNING - wish to wait on the cluster any longer after pressing CTRL-C</div><div>*** WARNING - simply terminate the cluster using the 'terminate' command.</div><div>*** WARNING - ************************************************************</div>
<div><br></div><div>*** WARNING - Waiting 5 seconds before continuing...</div><div>*** WARNING - Press CTRL-C to cancel...</div><div>5...4...3...2...1...</div><div>>>> Validating cluster template settings...</div>
<div>>>> Cluster template settings are valid</div><div>>>> Starting cluster...</div><div>>>> Launching a 2-node cluster...</div><div>>>> Launching master node (ami: ami-4b9f0a22, type: m1.small)...</div>
<div>>>> Creating security group @sc-jobscluster...</div><div>Reservation:r-ba8c99c1</div><div>>>> Launching node001 (ami: ami-4b9f0a22, type: m1.small)</div><div>SpotInstanceRequest:sir-a05ae014</div><div>
>>> Waiting for cluster to come up... (updating every 30s)</div><div>>>> Waiting for open spot requests to become active...</div><div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div>
<div>>>> Waiting for all nodes to be in a 'running' state...</div><div>2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div><div>>>> Waiting for SSH to come up on all nodes...</div>
<div>2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div><div>>>> Waiting for cluster to come up took 6.245 mins</div><div>>>> The master node is <a href="http://ec2-54-242-244-139.compute-1.amazonaws.com">ec2-54-242-244-139.compute-1.amazonaws.com</a></div>
<div>>>> Setting up the cluster...</div><div>>>> Attaching volume vol-53600b22 to master node on /dev/sdz ...</div><div>>>> Configuring hostnames...</div><div>2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div>
<div>*** WARNING - Cannot find device /dev/xvdz for volume vol-53600b22</div><div>*** WARNING - Not mounting vol-53600b22 on /usr/share/jobs</div><div>*** WARNING - This usually means there was a problem attaching the EBS volume to the master node</div>
</div><div><snip></div><div><br></div><div>So per the relevant, past email threads, I'm using the createvolume command, and it still gives this error. Also tried creating the volume thru the AWS console; subsequent cluster boot fails at the same point w/the same problem of not finding the device.</div>
<div><br></div><div>I'll appreciate any suggestions.</div><div><br></div><div>Thanks much,</div><div>Lyn</div><div><br></div><div><br></div>