<div dir="ltr">We are trying to run the loadbalancing when launching a cluster of QIIME AMI's (a software for analysis of next-gen sequencing data) and are running into some errors.<div><br></div><div style>The loadbalancing works well when running the StarCluster AMI but we have not been able to get QIIME (easily) installed on that AMI.</div>
<div style><br></div><div style>Below is the error. Any info would be great.</div><div style><br></div><div style>Thanks.</div><div style><br></div><div style><font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:10.5pt"><b><br>
</b></span></font></div><div style><font face="Calibri, Verdana, Helvetica, Arial"><span style="font-size:10.5pt"><b>StarCluster Configuration<br>
<br>
</b>####################################<br>
## StarCluster Configuration File ##<br>
####################################<br>
[global]<br>
DEFAULT_TEMPLATE=qiime<br>
ENABLE_EXPERIMENTAL=True<br>
<br>
###########################<br>
## Defining Cluster ##<br>
###########################<br>
<br>
[cluster QIIMETest]<br>
# change this to the name of one of the keypair sections defined above<br>
KEYNAME = StarCluster<br>
# number of ec2 instances to launch<br>
CLUSTER_SIZE = 2<br>
# create the following user on the cluster<br>
NODE_IMAGE_ID = ami-d5cc8fbc #FDA QIIME 11.10 image<br>
<br>
# instance type for all cluster nodes<br>
# (options: m1.medium, m3.2xlarge, cc2.8xlarge, m1.large, c1.xlarge, hs1.8xlarge, cr1.8xlarge, m1.small, c1.medium, cg1.4xlarge, m1.xlarge, m2.xlarge, hi1.4xlarge, t1.micro, m2.4xlarge, m2.2xlarge, m3.xlarge, cc1.4xlarge)<br>
NODE_INSTANCE_TYPE = m2.2xlarge<br>
VOLUMES = CFSANdata3<br>
CLUSTER_SHELL = bash<br>
CLUSTER_USER = ubuntu<br>
<br>
#############################<br>
## Configuring EBS Volumes ##<br>
#############################<br>
<br>
[volume CFSANdata3]<br>
#attach vol-c9999999 to /home on master node and NFS-shre to worker nodes<br>
VOLUME_ID = vol-xxxxxx<br>
MOUNT_PATH = /home/ubuntu/CFSANdata<br>
<br>
[plugin ipcluster]<br>
SETUP_CLASS = starcluster.plugins.ipcluster.IPCluster<br>
<br>
<b>Creating cluster
<br>
<br>
</b>ubuntu@ip-10-181-159-232:~$ starcluster start -c QIIMETest STAR-ELASTIC<br>
StarCluster - (<font color="#0000FF"><u><a href="http://star.mit.edu/cluster">http://star.mit.edu/cluster</a></u></font>) (v. 0.94)<br>
Software Tools for Academics and Researchers (STAR)<br><font color="#0000FF"><u><br>
</u></font><br>
>>> Validating cluster template settings...<br>
>>> Cluster template settings are valid<br>
>>> Starting cluster...<br>
>>> Launching a 2-node cluster...<br>
>>> Creating security group @sc-STAR-ELASTIC...<br>
Reservation:r-f8c93594<br>
>>> Waiting for cluster to come up... (updating every 30s)<br>
>>> Waiting for all nodes to be in a 'running' state...<br>
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Waiting for SSH to come up on all nodes...<br>
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Waiting for cluster to come up took 1.148 mins<br>
>>> The master node is <a href="http://ec2-50-19-65-196.compute-1.amazonaws.com">ec2-50-19-65-196.compute-1.amazonaws.com</a><br>
>>> Configuring cluster...<br>
>>> Attaching volume vol-12183458 to master node on /dev/sdz ...<br>
>>> Waiting for vol-12183458 to transition to: attached...<br>
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup<br>
>>> Configuring hostnames...<br>
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Mounting EBS volume vol-12183458 on /home/ubuntu/CFSANdata...<br>
>>> Creating cluster user: ubuntu (uid: 1000, gid: 1000)<br>
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Configuring scratch space for user(s): ubuntu<br>
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Configuring /etc/hosts on each node<br>
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Starting NFS server on master<br>
>>> Configuring NFS exports path(s):<br>
/home /home/ubuntu/CFSANdata<br>
>>> Mounting all NFS export path(s) on 1 worker node(s)<br>
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Setting up NFS took 0.087 mins<br>
>>> Configuring passwordless ssh for root<br>
>>> Configuring passwordless ssh for ubuntu<br>
>>> Running plugin starcluster.plugins.sge.SGEPlugin<br>
>>> Configuring SGE...<br>
>>> Configuring NFS exports path(s):<br>
/opt/sge6<br>
>>> Mounting all NFS export path(s) on 1 worker node(s)<br>
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Setting up NFS took 0.018 mins<br>
>>> Installing Sun Grid Engine...<br>
1/1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Creating SGE parallel environment 'orte'<br>
2/2 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%<br>
>>> Adding parallel environment 'orte' to queue 'all.q'<br>
>>> Configuring cluster took 0.867 mins<br>
>>> Starting cluster took 2.060 mins<br>
<br>
The cluster is now ready to use. To login to the master node<br>
as root, run:<br>
<br>
$ starcluster sshmaster STAR-ELASTIC<br>
<br>
If you're having issues with the cluster you can reboot the<br>
instances and completely reconfigure the cluster from<br>
scratch using:<br>
<br>
$ starcluster restart STAR-ELASTIC<br>
<br>
When you're finished using the cluster and wish to terminate<br>
it and stop paying for service:<br>
<br>
$ starcluster terminate STAR-ELASTIC<br>
<br>
Alternatively, if the cluster uses EBS instances, you can<br>
use the 'stop' command to shutdown all nodes and put them<br>
into a 'stopped' state preserving the EBS volumes backing<br>
the nodes:<br>
<br>
$ starcluster stop STAR-ELASTIC<br>
<br>
WARNING: Any data stored in ephemeral storage (usually /mnt)<br>
will be lost!<br>
<br>
You can activate a 'stopped' cluster by passing the -x<br>
option to the 'start' command:<br>
<br>
$ starcluster start -x STAR-ELASTIC<br>
<br>
This will start all 'stopped' nodes and reconfigure the<br>
cluster.<br>
<br>
<b>ubuntu@ip-$ starcluster loadbalance -m 80 -a 2 -n 2 -d -w 60 STAR-ELASTIC<br>
</b>StarCluster - (<font color="#0000FF"><u><a href="http://star.mit.edu/cluster">http://star.mit.edu/cluster</a></u></font>) (v. 0.94)<br>
Software Tools for Academics and Researchers (STAR)<br><br>
>>> Starting load balancer (Use ctrl-c to exit)<br>
Maximum cluster size: 80<br>
Minimum cluster size: 2<br>
Cluster growth rate: 2 nodes/iteration<br>
<br>
>>> Writing stats to file: /home/ubuntu/.starcluster/sge/STAR-ELASTIC/sge-stats.csv<br>
>>> Loading full job history<br>
*** WARNING - Failed to retrieve stats (1/5):<br>
Traceback (most recent call last):<br>
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/balancers/sge/__init__.py", line 536, in get_stats<br>
return self._get_stats()<br>
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/balancers/sge/__init__.py", line 507, in _get_stats<br>
qstatxml = '\n'.join(master.ssh.execute(qstat_cmd))<br>
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/sshutils/__init__.py", line 555, in execute<br>
msg, command, exit_status, out_str)<br>
RemoteCommandFailed: remote command 'source /etc/profile && qstat -u \* -xml -f -r' failed with status 2:<br>
qstat: invalid option -- 'm'<br>
qstat: conflicting options.<br>
usage:<br>
qstat [-f [-1]] [-W site_specific] [-x] [ job_identifier... | destination... ]<br>
qstat [-a|-i|-r|-e] [-u user] [-n [-1]] [-s] [-G|-M] [-R] [job_id... | destination...]<br>
qstat -Q [-f [-1]] [-W site_specific] [ destination... ]<br>
qstat -q [-G|-M] [ destination... ]<br>
qstat -B [-f [-1]] [-W site_specific] [ server_name... ]<br>
*** WARNING - Retrying in 60s<br>
</span></font>
</div></div>