[StarCluster] issues with adding multiple nodes to a running cluster

Justin Riley jtriley at MIT.EDU
Tue Jan 3 15:55:34 EST 2012


Uggh, this is totally a bug, another user reported on the github issue 
tracker and the issue has been fixed on github. I'm releasing 0.93 
today (skipping 0.92.2 version given the amount of new stuff in this 
release) which should fix this.

Will send an announcement once it's released. Stay tuned....

~Justin



On Tue 03 Jan 2012 03:53:39 PM EST, Wei Tao wrote:
> Hi all,
>
> From time to time, when I tried to add nodes to a running starcluster
> using either the loadbalance or addnodes, starcluster would miss fire.
> For example, I set "-a 5" in loadbalance, 
>
> command:
>   starcluster loadbalance -m 20 -a 5 -n 1 <mycluster>
>
> here is what I got:
>
> >>> Loading full job history
> Cluster size: 10
> Queued jobs: 361
> Oldest queued job: 2012-01-03 20 <tel:2012-01-03%2020>:13:56
> Avg job duration: 256 secs
> Avg job wait time: 167 secs
> Last cluster modification time: 2012-01-03 20 <tel:2012-01-03%2020>:17:07
> >>> A job has been waiting for 963 sec, longer than max 900
> >>> *** ADDING 5 NODES at 2012-01-03 20 <tel:2012-01-03%2020>:29:59.623917
> >>> Launching node(s): node010, node011, node012, node013, node014
> SpotInstanceRequest:sir-29586e14
> SpotInstanceRequest:sir-46e90414
> SpotInstanceRequest:sir-314a9814
> SpotInstanceRequest:sir-99387e14
> SpotInstanceRequest:sir-9ad72a14
> SpotInstanceRequest:sir-089dcc11
> SpotInstanceRequest:sir-09d28011
> SpotInstanceRequest:sir-64d4dc11
> SpotInstanceRequest:sir-45516411
> SpotInstanceRequest:sir-f2b31a11
> SpotInstanceRequest:sir-0198f214
> SpotInstanceRequest:sir-1db0a014
> SpotInstanceRequest:sir-49c97814
> SpotInstanceRequest:sir-94fdd414
> SpotInstanceRequest:sir-69db0014
> SpotInstanceRequest:sir-6f410612
> SpotInstanceRequest:sir-93c1c012
> SpotInstanceRequest:sir-e44c7c12
> SpotInstanceRequest:sir-dbc51012
> SpotInstanceRequest:sir-aa52dc12
> SpotInstanceRequest:sir-9f9e6811
> SpotInstanceRequest:sir-50053011
> SpotInstanceRequest:sir-33455211
> SpotInstanceRequest:sir-ffcdd011
> SpotInstanceRequest:sir-c1d7ee11
> >>> Waiting for node(s) to come up... (updating every 30s)
> >>> Waiting for open spot requests to become active...
> 34/34
> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
> >>> Waiting for all nodes to be in a 'running' state...
> 35/35
> |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%  
> >>> Waiting for SSH to come up on all nodes...
> ^C/35 |||||||||||||||||||||||||||||||||||||||||||||||||||||||        
>  |  85% 
>
> Instead of 5 nodes, 25 nodes were fired up. Did anyone experience
> similar issue? Is this a bug in the code or I miss something in my
> command?
>
> Thanks!
>
>
>
> -- 
> Wei Tao, Ph.D.
> TSI Biocomputing LLC
> 617-564-0934 <tel:617-564-0934>
>




More information about the StarCluster mailing list