[StarCluster] starcluster starts but not all nodes added as exec nodes

Kyeong Soo (Joseph) Kim kyeongsoo.kim at gmail.com
Tue Mar 15 18:30:27 EDT 2011


Hi Jeff,

I experienced the same thing with my 50-node configuration (c1.xlarge).
Out of 50 nodes, only 29 nodes are successfully identified by the SGE.

Regards,
Joseph

On Sat, Mar 5, 2011 at 10:15 PM, Jeff White <jeff at decide.com> wrote:
> I can frequently reproduce an issue where 'starcluster start' completes
> without error, but not all nodes are added to the SGE pool, which I verify
> by running 'qconf -sel' on the master. The latest example I have is creating
> a 25-node cluster, where only the first 12 nodes are successfully installed.
> The remaining instances are running and I can ssh to them but they aren't
> running sge_execd. There are only install log files for the first 12 nodes
> in /opt/sge6/default/common/install_logs. I have not found any clues in the
> starcluster debug log or the logs inside master:/opt/sge6/.
>
> I am running starcluster development snapshot 8ef48a3 downloaded on
> 2011-02-15, with the following relevant settings:
>
> NODE_IMAGE_ID=ami-8cf913e5
> NODE_INSTANCE_TYPE = m1.small
>
> I have seen this behavior with the latest 32-bit and 64-bit starcluster
> AMIs. Our workaround is to start a small cluster and progressively add nodes
> one at a time, which is time-consuming.
>
> Has anyone else noticed this and have a better workaround or an idea for a
> fix?
>
> jeff
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>



More information about the StarCluster mailing list