[StarCluster] Error Report

Justin Riley jtriley at MIT.EDU
Tue Aug 23 17:13:18 EDT 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Josh,

My apologies for the delay. The error you submitted could be related to:

1. A spotty Internet connection
2. DOS attacks on your instances' SSH daemons

In the first case there's not much I can do; you really need to have a
solid Internet connection for StarCluster to work. I've played around
with auto-reconnecting but it's a hack and likely to break in other more
extravagant ways so I've been hesitant to add it in.

However, if you lose your connection during a 'start' command there's no
need to destroy the cluster, just run restart instead:

    $ starcluster restart mycluster

This will simply reboot the instances and reconfigure the cluster all
over again rather than terminating the instances and wasting instance hours.

I'm working on a solution for the second case which is basically to
restrict SSH access to only each IP address that attempts to connect
with valid credentials. Essentially StarCluster would:

1. Figure out your current IP
2. Modify the security group permissions if necesssary to allow SSH
access from your current IP

This would happen for each new IP you try to start/sshmaster/ssnode/etc
from.

HTH,

~Justin

On 08/05/2011 03:05 PM, josh katz wrote:
> This error occurred after I had submitted 10 jobs 2 of which where 
> never completed. So i deleted them and tried again but thos ejobs
> were also never finished. Then when I tried to restart the cluster
> this error appeared and asked me to send it. Thus this email.
> 
> Thanks, Josh
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5UF+4ACgkQ4llAkMfDcrnl+QCcCT4jfQNJa9Pbr2//BMZJSa/I
n/YAoJOkXCfqtGJMqjs0wDId/+vKxGPg
=vgOd
-----END PGP SIGNATURE-----



More information about the StarCluster mailing list