Hi, <div><br></div><div>I appreciate the help I am getting from our mailing list very much,</div><div><br></div><div>I am just having some confusions again:</div><div><br></div><div>1. about ebs. I created the image of the my modified instance using the command ec2-create-image, and this was while there was an ebs volume attached. Then I changed the ami in the configuration file, to be the new ami ID, and I still keep the mounting of the ebs volume, so I end up with 2 ebs volumes attached to the new images. Is this normal? should I create the image after detaching the volume first?</div>
<div><br></div><div>2. I am having problems detaching the volume while keeping the image running, I can't find the commands that can do this, and when I used ec2-detach-volume, I caused more problems than solving any.</div>
<div><br></div><div>3. Also, I thought this ebs is shared by the sense that it is mounted so that all instances of the same cluster can read and write from. However, when I create a cluster of 2 instances, each one instantiate its own ebs volume from the starting volume in the configuration file. I am not sure if there is any thing that can make this volume itself truly shared. All I can find here:</div>
<div><br></div><div><a href="http://aws.amazon.com/ebs/">http://aws.amazon.com/ebs/</a></div><div><br></div><div>that ebs is attached to only one instance, and sharing is by taking snapshots. This will be a manual process, or too much programming. I need something like a scratch volume to be shared for an mpi application.</div>
<div><br></div><div>4. Also I searched for how to make an mpi application work on a number of instances, and couldn't locate the information about the machine file, and whether it is found by default in the ec2 configuration, or should I build it manually from the instance IDs or other identifiers, and if you can send me an example file, this will be great</div>
<div><div><br></div><div><br></div><div>thanks again for your support, </div><br>
</div><div>P.S. I also get this error message, but it doesn't stop me from ssh and terminating normally</div><div><br></div><div><div>$ starcluster start microSFMcluster </div><div>StarCluster - (<a href="http://web.mit.edu/starcluster">http://web.mit.edu/starcluster</a>) (v. 0.93.3)</div>
<div>Software Tools for Academics and Researchers (STAR)</div><div>Please submit bug reports to <a href="mailto:starcluster@mit.edu">starcluster@mit.edu</a></div><div><br></div><div>>>> Using default cluster template: microSFMcluster</div>
<div>>>> Validating cluster template settings...</div><div>>>> Cluster template settings are valid</div><div>>>> Starting cluster...</div><div>>>> Launching a 1-node cluster...</div><div>
>>> Creating security group @sc-microSFMcluster...</div><div>Reservation:r-e831c58d</div><div>>>> Waiting for cluster to come up... (updating every 30s)</div><div>>>> Waiting for all nodes to be in a 'running' state...</div>
<div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div><div>>>> Waiting for SSH to come up on all nodes...</div><div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div>
<div>>>> Waiting for cluster to come up took 1.334 mins</div><div>>>> The master node is <a href="http://ec2-67-202-55-20.compute-1.amazonaws.com">ec2-67-202-55-20.compute-1.amazonaws.com</a></div><div>
>>> Setting up the cluster...</div>
<div>>>> Attaching volume vol-69bd4807 to master node on /dev/sdz ...</div><div>>>> Configuring hostnames...</div><div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div>
<div>>>> Mounting EBS volume vol-69bd4807 on /home...</div><div>>>> Creating cluster user: None (uid: 1002, gid: 1002)</div><div>!!! ERROR - command 'groupadd -o -g 1002 ubuntu' failed with status 9 | 0% </div>
<div>!!! ERROR - command 'useradd -o -u 1002 -g 1002 -s `which bash` -m ubuntu' failed with status 6</div><div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div><div>>>> Configuring scratch space for user(s): ubuntu</div>
<div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div><div>>>> Configuring /etc/hosts on each node</div><div>1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div>
<div>>>> Starting NFS server on master</div><div>>>> Setting up NFS took 0.074 mins</div><div>>>> Configuring passwordless ssh for root</div><div>>>> Configuring passwordless ssh for ubuntu</div>
<div>>>> Shutting down threads...</div><div>20/20 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100% </div><div>Traceback (most recent call last):</div><div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cli.py", line 255, in main</div>
<div> sc.execute(args)</div><div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/commands/start.py", line 194, in execute</div><div> validate_running=validate_running)</div>
<div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cluster.py", line 1414, in start</div><div> return self._start(create=create, create_only=create_only)</div><div> File "<string>", line 2, in _start</div>
<div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/utils.py", line 87, in wrap_f</div><div> res = func(*arg, **kargs)</div><div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cluster.py", line 1437, in _start</div>
<div> self.setup_cluster()</div><div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cluster.py", line 1446, in setup_cluster</div><div> self._setup_cluster()</div><div>
File "<string>", line 2, in _setup_cluster</div>
<div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/utils.py", line 87, in wrap_f</div><div> res = func(*arg, **kargs)</div><div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/cluster.py", line 1460, in _setup_cluster</div>
<div> self.cluster_shell, self.volumes)</div><div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/clustersetup.py", line 350, in run</div><div> self._setup_passwordless_ssh()</div>
<div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/clustersetup.py", line 231, in _setup_passwordless_ssh</div><div> auth_conn_key=True)</div><div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/node.py", line 411, in generate_key_for_user</div>
<div> self.ssh.mkdir(ssh_folder)</div><div> File "/Library/Python/2.6/site-packages/StarCluster-0.93.3-py2.6.egg/starcluster/sshutils/__init__.py", line 245, in mkdir</div><div> return self.sftp.mkdir(path, mode)</div>
<div> File "build/bdist.macosx-10.6-universal/egg/ssh/sftp_client.py", line 303, in mkdir</div><div> self._request(CMD_MKDIR, path, attr)</div><div> File "build/bdist.macosx-10.6-universal/egg/ssh/sftp_client.py", line 635, in _request</div>
<div> return self._read_response(num)</div><div> File "build/bdist.macosx-10.6-universal/egg/ssh/sftp_client.py", line 682, in _read_response</div><div> self._convert_status(msg)</div><div> File "build/bdist.macosx-10.6-universal/egg/ssh/sftp_client.py", line 708, in _convert_status</div>
<div> raise IOError(errno.ENOENT, text)</div><div>IOError: [Errno 2] No such file</div><div><br></div><div>!!! ERROR - Oops! Looks like you've found a bug in StarCluster</div><div>!!! ERROR - Crash report written to: /Users/manal/.starcluster/logs/crash-report-2240.txt</div>
<div>!!! ERROR - Please remove any sensitive data from the crash report</div><div>!!! ERROR - and submit it to <a href="mailto:starcluster@mit.edu">starcluster@mit.edu</a></div></div><div><br></div>