[StarCluster] starcluster plugin status code 127

Justin Riley jtriley at MIT.EDU
Wed Dec 21 11:55:17 EST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Wei,

The problem is qconf is not in your default path when running execute.
This is because SGE is installed in /opt/sge6 and relies on
/etc/profile.d to setup the paths correctly. Unfortunately these configs
aren't automatically loaded when executing commands over SSH. For now
you can fix this using:

node.ssh.execute('source /etc/profile && qconf -mattr queue
load_thresholds np_load_avg=1.5 all.q')

In the upcoming version you can pass source_profile=True as a parameter
to execute which will do this for you.

With that said I'm working on making an SGE (now OGS) plugin which will
allow you to add custom SGE settings like the one you're applying
above.  For those interested in contributing you can see the latest
progress in the 'sge-plugin' branch on github:

https://github.com/jtriley/StarCluster/blob/sge-plugin/starcluster/plugins/sge.py

This will allow you to do what your plugin is doing now:

[plugin sge]
setup_class = starcluster.plugins.sge.SGEPlugin
load_avg = 1.5
create_queues = myqueue, gpu
scheduling_interval = 5
....

If anyone's interested in contributing patches that implement such
options please fork the project on GitHUB, checkout the sge-plugin
branch, make the changes to SGEPlugin, and submit a pull request.

~Justin

On 12/21/11 3:33 AM, Wei Tao wrote:
> Hi Don,
>
> The plugin picked up the queue_to_config (all.q) as evidenced in the
error message:
>
> !!! ERROR - command 'qconf -mattr queue load_thresholds np_load_avg=1.5
*all.q*' failed with status 127
>
> My intention is to config the SGE at the cluster boot up time using the
plugin. Since I executed "starcluster runplugin" after the cluster
already booted up, it apparently is not an issue of plugin execution timing.
>
> The only reason I run the plugin or the plugin command after cluster
already booted up is for debugging purposes.
>
> It's just very strange to me that as root I can execute the exact same
command on the master node without any issue, but running as starcluster
plugin would fail.
>
> Also, what is status 127 anyway??
>
> Thanks!
>
> -Wei
>
>
> On Wed, Dec 21, 2011 at 1:42 AM, Don MacMillen <macd at nimbic.com
<mailto:macd at nimbic.com>> wrote:
>
> The only difference that I can see is that I have not used arguments to
> the plugin. I guess you did remember to set the argument "queue_to_config"
> in your config file?
>
> Another possible issue is if you are trying to reconfig a cluster that
is just
> in the process of coming up. If you try that command early on, it will
fail because
> sge has not been installed yet. Why do you want to config the cluster
afterwards
> rather than just on the initial bring up? HTH and let us know what you
find out.
> Regards.
>
> Don
>
>
> On Tue, Dec 20, 2011 at 10:02 PM, Wei Tao <wei.tao at tsibiocomputing.com
<mailto:wei.tao at tsibiocomputing.com>> wrote:
>
> Hi all,
>
> I tried to implement the queue configuration suggested by Don MacMillen
a while ago. Here is my plugin code:
>
> from starcluster.clustersetup import ClusterSetup
>
> class SgeConfig(ClusterSetup):
> def __init__(self, queue_to_config):
> self.queue_to_config = queue_to_config
>
> def run(self, nodes, master, user, user_shell, volumes):
> cmd_strg = 'qconf -mattr queue load_thresholds np_load_avg=1.5 %s'
%self.queue_to_config
> output = master.ssh.execute(cmd_strg)
>
> When I execute "starcluster runplugin <myplugin> <mycluster>", I got:
>
> >>> Running plugin <myplugin>
> !!! ERROR - command 'qconf -mattr queue load_thresholds np_load_avg=1.5
all.q' failed with status 127
>
> If I sshmaster and run the command directly as this:
>
> root at master:~# qconf -mattr queue load_thresholds np_load_avg=1.5 all.q
> root at master modified "all.q" in cluster queue list
>
> It works fine. Could someone please point out why the plugin would have
a status code 127 when direct execution of the command apparently works
fine?
>
> Thanks for the help!
>
>
> -Wei
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu <mailto:StarCluster at mit.edu>
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
>
>
>
> --
> Wei Tao, Ph.D.
> TSI Biocomputing LLC
> 617-564-0934

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7yD3QACgkQ4llAkMfDcrmaQQCcCrSNwpQt53aqTU96MiI9R839
3yYAn1P/CRJjQIvzWLfht3kd3a6mZI1M
=R7Fe
-----END PGP SIGNATURE-----

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20111221/4bb58c2b/attachment-0001.htm


More information about the StarCluster mailing list