[StarCluster] Loadbalancing error against ubuntu QIIME AMI

Rayson Ho raysonlogin at gmail.com
Wed Sep 4 12:18:29 EDT 2013


James,

One quick workaround: last time I looked at the load balancer, qstat
is only issued on the master node, so you should be able to run the
standard StarCluster AMI for the Grid Engine master host, and run the
QIIME AMI for the Grid Engine execution hosts by specifying:

MASTER_IMAGE_ID = <Standard StarCluster AMI>
NODE_IMAGE_ID = <QIIME AMI>

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html



On Sat, Aug 31, 2013 at 8:55 AM, Rayson Ho <raysonlogin at gmail.com> wrote:
> Hi James,
>
> The SGE Load Balancer needs the SGE executables to be in $PATH, and
> looks like the QIIME AMI does not have the modifications from
> StarCluster -- specifically /etc/profile.d/sge.sh that sets the
> environment variables needed by SGE qstat. (The "qstat" that complains
> about invalid option is the PBS qstat that happens to be in the
> execution $PATH.)
>
> I don't have access to the QIIME AMI (I've tried to find the
> ami-d5cc8fbc AMI but couldn't find it - is it available to the
> public?), I believe there are at least 2 ways to fix it:
>
> 1) Patch the QIIME AMI - Just look at the execution host of
> StarCluster and see how /etc/profile.d/sge.sh is introduced into the
> default environment. Then create a new AMI based on the modified
> instance (would be easy if it is EBS-based -- it's just a few steps in
> the AWS Management Console).
>
> 2) Write a StarCluster plugin that fixes this $PATH problem on the
> fly, or even add that to the SGEPlugin so that if the environment
> settings are not available, inject them during StarCluster-SGE
> bootstrap.
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
> http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html
>
>
> On Thu, Aug 29, 2013 at 8:14 AM, james pettengill <fixtgear at gmail.com> wrote:
>>
>> We are trying to run the loadbalancing when launching a cluster of QIIME AMI's (a software for analysis of next-gen sequencing data) and are running into some errors.
>
>
>> >>> Writing stats to file: /home/ubuntu/.starcluster/sge/STAR-ELASTIC/sge-stats.csv
>> >>> Loading full job history
>> *** WARNING - Failed to retrieve stats (1/5):
>> Traceback (most recent call last):
>>   File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/balancers/sge/__init__.py", line 536, in get_stats
>>     return self._get_stats()
>>   File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/balancers/sge/__init__.py", line 507, in _get_stats
>>     qstatxml = '\n'.join(master.ssh.execute(qstat_cmd))
>>   File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.94-py2.7.egg/starcluster/sshutils/__init__.py", line 555, in execute
>>     msg, command, exit_status, out_str)
>> RemoteCommandFailed: remote command 'source /etc/profile && qstat -u \* -xml -f -r' failed with status 2:
>> qstat: invalid option -- 'm'
>> qstat: conflicting options.
>> usage:
>> qstat [-f [-1]] [-W site_specific] [-x] [ job_identifier... | destination... ]
>> qstat [-a|-i|-r|-e] [-u user] [-n [-1]] [-s] [-G|-M] [-R] [job_id... | destination...]
>> qstat -Q [-f [-1]] [-W site_specific] [ destination... ]
>> qstat -q [-G|-M] [ destination... ]
>> qstat -B [-f [-1]] [-W site_specific] [ server_name... ]
>> *** WARNING - Retrying in 60s
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>


More information about the StarCluster mailing list