[StarCluster] Large cluster (125 nodes) launch failure

Kyeong Soo (Joseph) Kim kyeongsoo.kim at gmail.com
Wed Mar 16 12:36:07 EDT 2011


Justin,

Again, many thanks for your valuable suggestions!
I will try those next time I configure large clusters; now, I
terminated it all, started from scratch, and am running five 25-node
clusters (still the same issues even with this configuration though).

By the way, our postings are crossed each other through two different
threads (difficulty of multi-threading discussions?).
Please, check the other thread where I just responded to your post
with the requested log files.

Regards,
Joseph


On Wed, Mar 16, 2011 at 4:24 PM, Justin Riley <jtriley at mit.edu> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 03/16/2011 12:18 PM, Justin Riley wrote:
>> 1. wait for your jobs to finish and manually terminate the idle nodes to
>> stop paying for them in the mean time (tedious)
>
> You might also try on the idle nodes:
>
> $ cd /opt/sge6
> $ ./inst_sge -x -auto ./ec2_sge.conf
>
> I don't *think* this will affect any currently running jobs but I'm not
> 100% so if you're concerned I wouldn't recommend trying this.
>
> ~Justin
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.17 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk2A5CoACgkQ4llAkMfDcrmuhgCgkmTpoLKVZgwHJpOYUpMzi1dB
> qzUAnij2B/2ooh+kbDqU5bQTuJA2K44U
> =rmXZ
> -----END PGP SIGNATURE-----
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>



More information about the StarCluster mailing list