[StarCluster] Starcluster and elastic load balancing

Justin Riley justin.t.riley at gmail.com
Tue Apr 19 00:08:52 EDT 2011


Hi Joseph,

Sorry for the delay. The ssh-keyscan issue could certainly have played a role in this issue assuming the size of the cluster was large enough, however, it's hard to tell. If you encounter this issue again when testing the latest code would you mind posting an issue on the github page and attaching the relevant logs?

Thanks!

~Justin


On Apr 11, 2011, at 12:16 PM, Kyeong Soo (Joseph) Kim wrote:

> Justin,
> 
> Is this related to the ssh_keyscan problem which you mentioned in
> another thread (regarding the node scalability)?
> 
> Anyhow, I will give it a try with the latest code as soon as I am
> ready for another round of simulation.
> 
> Regards,
> Joseph
> 
> On Wed, Apr 6, 2011 at 4:13 PM, Justin Riley <justin.t.riley at gmail.com> wrote:
>> Hi Joseph,
>> 
>> That's strange. This problem could be related to the fact that nodes
>> were failing to be added to SGE which might have thrown the load
>> balancer logic off... In any event would you mind testing the latest
>> code with the load balancer and see if this happens again?
>> 
>> Thanks!
>> 
>> ~Justin
>> 
>> On Tue, Mar 15, 2011 at 1:27 PM, Kyeong Soo (Joseph) Kim
>> <kyeongsoo.kim at gmail.com> wrote:
>>> Hi Rajat,
>>> 
>>> This is to report one strange behaviour I just encountered during the
>>> use of StarCluster with your loadbalancer (LB).
>>> Below is the snippet of "/etc/hosts" file:
>>> 
>>> .......
>>> 10.76.91.4 ip-10-76-91-4.ec2.internal ip-10-76-91-4 node016
>>> 10.112.209.34 ip-10-112-209-34.ec2.internal ip-10-112-209-34 node016
>>> .....
>>> 
>>> My cluster initially started with 10 nodes and I ran LB with maximum
>>> node set to 20.
>>> 
>>> It seems that, in the middle of adding new nodes to the cluster, the
>>> LB added a new node with a duplicate host name (i.e. node016) for
>>> unknown reasons; I could see the 2nd "node016" instance through the
>>> AWS mgmt. console, but found that it was not used by the SGE.
>>> Fortunately, manually terminating the node through the console didn't
>>> affect the SGE and the already running jobs.
>>> 
>>> With Regards,
>>> Joseph
>>> --
>>> Kyeong Soo (Joseph) Kim, Ph.D.
>>> Senior Lecturer in Networking
>>> Room 112, Digital Technium
>>> Multidisciplinary Nanotechnology Centre, College of Engineering
>>> Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
>>> TEL: +44 (0)1792 602024
>>> EMAIL: k.s.kim_at_swansea.ac.uk
>>> HOME: http://iat-hnrl.swan.ac.uk/ (group)
>>>             http://iat-hnrl.swan.ac.uk/~kks/ (personal)
>>> 
>>> 
>>> On Mon, Jan 31, 2011 at 9:09 AM, Kyeong Soo (Joseph) Kim
>>> <kyeongsoo.kim at gmail.com> wrote:
>>>> Hello Rajat,
>>>> I am very interested in your work on the elastic load balancing; I do
>>>> remember that you posted some graphs on early results in the past and that
>>>> you were working on your MSc thesis.
>>>> In fact, this new feature will be critical for my current research requiring
>>>> about 3~400 independent simulation runs and I do highly appreciate your
>>>> great contribution to the StarCluster.
>>>> By the way, I wonder whether you have published your work in any
>>>> conferences/journals yet.
>>>> Regards,
>>>> Joseph
>>>> --
>>>> Kyeong Soo (Joseph) Kim, Ph.D.
>>>> Senior Lecturer in Networking
>>>> Room 112, Digital Technium
>>>> Multidisciplinary Nanotechnology Centre, College of Engineering
>>>> Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
>>>> TEL: +44 (0)1792 602024
>>>> EMAIL: k.s.kim_at_swansea.ac.uk
>>>> HOME: http://iat-hnrl.swan.ac.uk/ (group)
>>>>             http://iat-hnrl.swan.ac.uk/~kks/ (personal)
>>>> 
>>>> 
>>>> On Fri, Jan 28, 2011 at 6:31 PM, Rajat Banerjee <rbanerj at fas.harvard.edu>
>>>> wrote:
>>>>> 
>>>>> Hi Archie,
>>>>> Yes, there is ELB built into the latest releases of StarCluster. I wrote
>>>>> it, so feel free to write me (+ the list) with any questions.
>>>>> The docs on
>>>>> http://web.mit.edu/stardev/cluster/docs/index.html
>>>>> haven't been updated in a while. There is a documentation page on
>>>>> starcluster in the code base, see
>>>>> /starcluster/StarCluster/docs/sphinx/load_balancer.rst
>>>>> That doc should have all of the information you need, and is readable in
>>>>> plain text.
>>>>> Typically, this is how I fire up the load balancer:
>>>>> starcluster bal <cluster_tag> -m <MAX_NODES you want> -n <MIN_NODES you
>>>>> want>
>>>>> It will poll the cluster every 60 seconds and make decisions. The
>>>>> decisions are described in load_balancer.rst. There is a visualizer which
>>>>> makes 6 graphs with matplotlib to show you how many nodes are working, how
>>>>> many jobs are running, queued, avg load, etc, but the visualizer still needs
>>>>> a little bit of work.
>>>>> Hope that helps, and feel free to send back questions.
>>>>> Rajat Banerjee
>>>>> 
>>>>> On Fri, Jan 28, 2011 at 12:29 PM, <starcluster-request at mit.edu> wrote:
>>>>>> 
>>>>>> Send StarCluster mailing list submissions to
>>>>>>        starcluster at mit.edu
>>>>>> 
>>>>>> To subscribe or unsubscribe via the World Wide Web, visit
>>>>>>        http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>> or, via email, send a message with subject or body 'help' to
>>>>>>        starcluster-request at mit.edu
>>>>>> 
>>>>>> You can reach the person managing the list at
>>>>>>        starcluster-owner at mit.edu
>>>>>> 
>>>>>> When replying, please edit your Subject line so it is more specific
>>>>>> than "Re: Contents of StarCluster digest..."
>>>>>> 
>>>>>> Today's Topics:
>>>>>> 
>>>>>>   1. Starcluster and elastic load balancing (Archie Russell)
>>>>>> 
>>>>>> 
>>>>>> ---------- Forwarded message ----------
>>>>>> From: Archie Russell <archier at gmail.com>
>>>>>> To: starcluster at mit.edu
>>>>>> Date: Thu, 27 Jan 2011 11:40:00 -0800
>>>>>> Subject: [StarCluster] Starcluster and elastic load balancing
>>>>>> 
>>>>>> Hi,
>>>>>> Online it says Starcluster has Elastic Load Balancing built into the
>>>>>> latest code
>>>>>> version at Github.     How would I go about using this?     How does
>>>>>> it work,  e.g.
>>>>>> when does it fire up new nodes and when does it shut them down?
>>>>>> Thanks,
>>>>>> Archie
>>>>>> _______________________________________________
>>>>>> StarCluster mailing list
>>>>>> StarCluster at mit.edu
>>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> StarCluster mailing list
>>>>> StarCluster at mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>>>> 
>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> StarCluster mailing list
>>> StarCluster at mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>> 
>> 





More information about the StarCluster mailing list