[StarCluster] set_keepalive
David Stuebe
DStuebe at asascience.com
Fri Apr 25 11:19:16 EDT 2014
Hi Starcluster
I have been building some plugins that take a while to run because they are building and installing large libraries. As a result I have seen issues with ssh terminating my connection while the process is still running. Which seems to return exit code 1, although the process continues on the cluster.
For my custom plugins I added the following line to apply a keep alive to the ssh transport.
def run(self, nodes, master, user, user_shell, volumes):
…
for node in nodes:
node.ssh.transport.set_keepalive(30)
…
This can be done this way, but you might consider adding it somewhere in starcluster, probably in the connect method of SSHClient:
https://github.com/jtriley/StarCluster/blob/develop/starcluster/sshutils.py#L100
Here are the methods form paramiko:
https://github.com/paramiko/paramiko/blob/master/paramiko/packet.py#L175
https://github.com/paramiko/paramiko/blob/master/paramiko/transport.py#L762
Another step that would help is to add a longer disconnect to the default /etc/ssh/sshd_config in the cluster ami.
For instance I have used one of my plugins to set:
ClientAliveInterval 600
ClientAliveCountMax 3
That should keep ssh connections open for half an hour.
David Stuebe
Scientist & Software Engineer
55 Village Square Drive
South Kingstown, RI 02879-8248
Tel: +1 (401) 789-6224
Email: David.Stuebe at rpsgroup.com<mailto:David.Stuebe at rpsgroup.com>
www: asascience.com<http://www.asascience.com/> | rpsgroup.com<http://www.rpsgroup.com/>
A member of the RPS Group plc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140425/335bf3ab/attachment-0001.htm
More information about the StarCluster
mailing list