[StarCluster] dealing with EBS volumes

Sergio Mafra sergiohmafra at gmail.com
Fri Sep 21 20:13:14 EDT 2012


Justino,

Regarding EBS image, what is best when you've got all stuff installed and demands a new AMI of that? I've got an EBS volume attached to my cluster. If I go to AWS console and issue an image copy option, is it going to work? Do i have to start a 1-instance cluster and reinstall all software again?

Best regards,

Sérgio Mafra


Em 06/08/2012, às 13:46, Justin Riley <jtriley at mit.edu> escreveu:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hi Manal,
> 
> My apologies for the extreme delay. I'm still catching up on
> responding to threads.
> 
>> 1. about ebs. I created the image of the my modified instance using
>> the command ec2-create-image, and this was while there was an ebs
>> volume attached. Then I changed the ami in the configuration file,
>> to be the new ami ID, and I still keep the mounting of the ebs
>> volume, so I end up with 2 ebs volumes attached to the new images.
>> Is this normal? should I create the image after detaching the
>> volume first?
> 
> StarCluster uses Amazon's create-image API when creating new AMIs from
> EBS-backed instances. This call will automatically snapshot any
> attached EBS volumes and include them in the new AMI's "block device
> mapping".
> 
> This means anytime you start a new instance with the new AMI a new EBS
> volume will be created per snapshot in the AMI's block device mapping
> and automatically attached to the instance. To prevent extra volumes
> from being included in the AMI you should detach all external EBS
> volumes before creating the AMI.
> 
> If you specify a list of volumes in your default cluster template and
> then use starcluster to start the image host then the specified
> volumes will be attached to the image host by default. In this case I
> would recommend either temporarily commenting out your volumes list in
> the default template or create an alternate template and then use the
> '-c' option to the start command to specify the alternate template, e.g.:
> 
> [cluster image]
> cluster_size = 1
> keyname = mykey
> node_instance_type = m1.small
> node_image_id = ami-asdflkasdf
> 
> $ starcluster start -s 1 -c image -o image_host
> 
> I will add a note to the docs about this caveat with using the 'start'
> command to launch the image host.
> 
>> 2. I am having problems detaching the volume while keeping the
>> image running, I can't find the commands that can do this, and when
>> I used ec2-detach-volume, I caused more problems than solving any.
> 
> You can use ec2-detach-volume or the AWS console to detach volumes
> from the image host. You need to make sure to unmount the volume
> before detaching. After detaching you should then wait for the volume
> to  be in the 'available' state before creating the AMI.
> 
>> 3. Also, I thought this ebs is shared by the sense that it is
>> mounted so that all instances of the same cluster can read and
>> write from. However, when I create a cluster of 2 instances, each
>> one instantiate its own ebs volume from the starting volume in the
>> configuration file. I am not sure if there is any thing that can
>> make this volume itself truly shared. All I can find here:
>> 
>> http://aws.amazon.com/ebs/
>> 
>> that ebs is attached to only one instance, and sharing is by
>> taking snapshots. This will be a manual process, or too much
>> programming. I need something like a scratch volume to be shared
>> for an mpi application.
> 
> The only way volumes can be shared is through a network file share.
> StarCluster uses NFS to share all volumes specified in your volumes
> list in the config across the cluster. In your case you're seeing the
> 'extra' volumes being created and attached as a consequence of having
> external EBS volumes mounted when creating your new AMI. These are not
> handled by StarCluster. Only volumes listed in your config will be
> NFS-shared across the cluster.
> 
>> 4. Also I searched for how to make an mpi application work on a
>> number of instances, and couldn't locate the information about the
>> machine file, and whether it is found by default in the ec2
>> configuration, or should I build it manually from the instance IDs
>> or other identifiers, and if you can send me an example file, this
>> will be great
> 
> I would recommend using SGE to submit parallel jobs on the cluster.
> You can easily submit a job that requests N processors on the cluster
> without needing a hostfile:
> 
> $ qsub -b y -pe orte 50 /path/to/your/mpi/executable
> 
> See here for more details (please read that section in full)
> 
> http://web.mit.edu/star/cluster/docs/latest/plugins/sge.html#submitting-openmpi-jobs-using-a-parallel-environment
> 
>> thanks again for your support,
> 
> My pleasure :D
> 
>> P.S. I also get this error message, but it doesn't stop me from ssh
>> and terminating normally
> 
> What is your cluster_user setting in your config? Also would you mind
> opening $HOME/.starcluster/logs/debug.log, searching for 'Creating
> cluster user' and send the surrounding lines. This will give us more
> info on what's happening.
> 
> These lines indicate that something weird is going on when creating
> the cluster user:
> 
> 
> !!! ERROR - command 'groupadd -o -g 1002 ubuntu' failed with status 9
> !!! ERROR - command 'useradd -o -u 1002 -g 1002 -s `which bash` -m
> ubuntu' failed with status 6
> 
> Do you have cluster_user = ubuntu by chance? I need to look into how
> cluster_user could show up as "None" in the log above...
> 
> ~Justin
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.19 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAlAf9OQACgkQ4llAkMfDcrnVTgCfSGah3lMOQqSHyybJeeoDrpG/
> LosAn1Iv5DEMfqihKoEfiUuOSE7p5LzM
> =BFRO
> -----END PGP SIGNATURE-----
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster



More information about the StarCluster mailing list