[StarCluster] dealing with EBS volumes
Justin Riley
jtriley at MIT.EDU
Mon Aug 6 12:46:28 EDT 2012
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi Manal,
My apologies for the extreme delay. I'm still catching up on
responding to threads.
> 1. about ebs. I created the image of the my modified instance using
> the command ec2-create-image, and this was while there was an ebs
> volume attached. Then I changed the ami in the configuration file,
> to be the new ami ID, and I still keep the mounting of the ebs
> volume, so I end up with 2 ebs volumes attached to the new images.
> Is this normal? should I create the image after detaching the
> volume first?
StarCluster uses Amazon's create-image API when creating new AMIs from
EBS-backed instances. This call will automatically snapshot any
attached EBS volumes and include them in the new AMI's "block device
mapping".
This means anytime you start a new instance with the new AMI a new EBS
volume will be created per snapshot in the AMI's block device mapping
and automatically attached to the instance. To prevent extra volumes
from being included in the AMI you should detach all external EBS
volumes before creating the AMI.
If you specify a list of volumes in your default cluster template and
then use starcluster to start the image host then the specified
volumes will be attached to the image host by default. In this case I
would recommend either temporarily commenting out your volumes list in
the default template or create an alternate template and then use the
'-c' option to the start command to specify the alternate template, e.g.:
[cluster image]
cluster_size = 1
keyname = mykey
node_instance_type = m1.small
node_image_id = ami-asdflkasdf
$ starcluster start -s 1 -c image -o image_host
I will add a note to the docs about this caveat with using the 'start'
command to launch the image host.
> 2. I am having problems detaching the volume while keeping the
> image running, I can't find the commands that can do this, and when
> I used ec2-detach-volume, I caused more problems than solving any.
You can use ec2-detach-volume or the AWS console to detach volumes
from the image host. You need to make sure to unmount the volume
before detaching. After detaching you should then wait for the volume
to be in the 'available' state before creating the AMI.
> 3. Also, I thought this ebs is shared by the sense that it is
> mounted so that all instances of the same cluster can read and
> write from. However, when I create a cluster of 2 instances, each
> one instantiate its own ebs volume from the starting volume in the
> configuration file. I am not sure if there is any thing that can
> make this volume itself truly shared. All I can find here:
>
> http://aws.amazon.com/ebs/
>
> that ebs is attached to only one instance, and sharing is by
> taking snapshots. This will be a manual process, or too much
> programming. I need something like a scratch volume to be shared
> for an mpi application.
The only way volumes can be shared is through a network file share.
StarCluster uses NFS to share all volumes specified in your volumes
list in the config across the cluster. In your case you're seeing the
'extra' volumes being created and attached as a consequence of having
external EBS volumes mounted when creating your new AMI. These are not
handled by StarCluster. Only volumes listed in your config will be
NFS-shared across the cluster.
> 4. Also I searched for how to make an mpi application work on a
> number of instances, and couldn't locate the information about the
> machine file, and whether it is found by default in the ec2
> configuration, or should I build it manually from the instance IDs
> or other identifiers, and if you can send me an example file, this
> will be great
I would recommend using SGE to submit parallel jobs on the cluster.
You can easily submit a job that requests N processors on the cluster
without needing a hostfile:
$ qsub -b y -pe orte 50 /path/to/your/mpi/executable
See here for more details (please read that section in full)
http://web.mit.edu/star/cluster/docs/latest/plugins/sge.html#submitting-openmpi-jobs-using-a-parallel-environment
> thanks again for your support,
My pleasure :D
> P.S. I also get this error message, but it doesn't stop me from ssh
> and terminating normally
What is your cluster_user setting in your config? Also would you mind
opening $HOME/.starcluster/logs/debug.log, searching for 'Creating
cluster user' and send the surrounding lines. This will give us more
info on what's happening.
These lines indicate that something weird is going on when creating
the cluster user:
!!! ERROR - command 'groupadd -o -g 1002 ubuntu' failed with status 9
!!! ERROR - command 'useradd -o -u 1002 -g 1002 -s `which bash` -m
ubuntu' failed with status 6
Do you have cluster_user = ubuntu by chance? I need to look into how
cluster_user could show up as "None" in the log above...
~Justin
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAlAf9OQACgkQ4llAkMfDcrnVTgCfSGah3lMOQqSHyybJeeoDrpG/
LosAn1Iv5DEMfqihKoEfiUuOSE7p5LzM
=BFRO
-----END PGP SIGNATURE-----
More information about the StarCluster
mailing list