[StarCluster] commlib error

Rajat Banerjee rajatb at post.harvard.edu
Tue Sep 23 09:33:33 EDT 2014


Hi Amanda,
It looks like you cannot communicate with the master node anymore. The
error message is because starcluster failed to execute a simple 'source
/etc/profile/' command with a 'connection refused' error.

Can you paste us the output of the following two commands:

> starcluster listclusters (should list status of all your active clusters
and running nodes)

> starcluster sshmaster <your cluster name> (i'm expecting this to fail)

Raj

On Mon, Sep 22, 2014 at 5:13 PM, Amanda Joy Kedaigle <mandyjoy at mit.edu>
wrote:

>  Hi,
>
> I am trying to run starcluster's loadbalancer to keep only one node
> running until jobs are submitted to the cluster. I know it's an
> experimental feature, but I'm wondering if anyone has run into this error
> before, or has any suggestions. The cluster has been whittled down to 1
> node after a weekend of inactivity, and now it seems that when jobs are
> submitted to the queue, instead of adding nodes, SGE fails.
>
> >>> Loading full job history
> *** WARNING - Failed to retrieve stats (1/5):
> Traceback (most recent call last):
>   File
> "/net/dorsal/apps/python2.7/lib/python2.7/site-packages/StarCluster-0.95.5-py2.7.egg/starcluster/balancers/sge/__init__.py",
> line 552, in get_stats
>     return self._get_stats()
>   File
> "/net/dorsal/apps/python2.7/lib/python2.7/site-packages/StarCluster-0.95.5-py2.7.egg/starcluster/balancers/sge/__init__.py",
> line 522, in _get_stats
>     qhostxml = '\n'.join(master.ssh.execute('qhost -xml'))
>   File
> "/net/dorsal/apps/python2.7/lib/python2.7/site-packages/StarCluster-0.95.5-py2.7.egg/starcluster/sshutils.py",
> line 578, in execute
>     msg, command, exit_status, out_str)
> RemoteCommandFailed: remote command 'source /etc/profile && qhost -xml'
> failed with status 1:
> error: commlib error: got select error (Connection refused)
> error: unable to send message to qmaster using port 63231 on host
> "master": got send error
>
> Thanks for any help!
> Amanda
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140923/613a436f/attachment-0001.htm


More information about the StarCluster mailing list