[StarCluster] 100 nodes cluster

Luis M. Carril lmcarril at cesga.es
Mon Oct 17 05:48:23 EDT 2011


Hi,
     Although I´ve never tested a deployment so big, I´ve had a lot of 
problems with 10-20 node deployments. Always one machine or two hangs 
booting or deploying, which is pretty annoying; so I can´t have the 
cluster deployment completely automatized because I have to watch it to 
stop or boot the failing nodes.

Best regards
Luis M Carril

El 14/10/2011 16:46, Paolo Di Tommaso escribió:
> Hi All,
>
> I've tried to setup a cluster with 100 nodes with quite powerful machines (Hi-Mem double extra large configuration) but it ended in a total failure.
>
> The overall configuration process was extremely slow. Five instances blocked in pending state for more than 10 minutes so I had to terminate them manually .
>
> Also other machines returns some error codes, for example mounting the /home and other SGE components.
>
> I had to stop the initialization phase manually after more than 30 minutes, because it seem to hung.
>
>
> I'm not blaming about StarCluster, it is really a nice piece of software. The problem really seems to be the Amazon infrastructure that has lot of latencies and unreliable behaviors.
>
>
> What is your opinion about that? Is there anyone running successfully a "big" cluster using the StarCluster tool?
>
>
>
>
> Thank you,
>
> Paolo Di Tommaso
> Software Engineer
> Comparative Bioinformatics Group
> Centre de Regulacio Genomica (CRG)
> Dr. Aiguader, 88
> 08003 Barcelona, Spain
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster

-- 
Luis M. Carril
Project Technician
Galicia Supercomputing Center (CESGA)
Avda. de Vigo s/n
15706 Santiago de Compostela
SPAIN

Tel: 34-981569810 ext 249
lmcarril at cesga.es
www.cesga.es


==================================================================




More information about the StarCluster mailing list