[StarCluster] Light Reading - My thesis on Elastic Load Balancing

Kyeong Soo (Joseph) Kim kyeongsoo.kim at gmail.com
Sun Apr 3 13:25:18 EDT 2011


Hi Rajat,

Great thank for your sharing the thesis.
As I said, I've been really looking forward to seeing this and now I have it!

Because here in the UK we will have 4-week Easter break from tomorrow,
I will have time to read your thesis.
Hopefully, I will get back to you with constructive feedback and comments.

Again, many thanks for your great contribution and sharing the thesis.

Regards,
Joseph

P.S. I found all your figures rather blurred, maybe as a result of the
use of bitmap format (e.g., PNG). You may consider vector graphic
format (e.g. PS/EPS) in this regard; MS Word has EPS import filter.

--
Kyeong Soo (Joseph) Kim, Ph.D.
Senior Lecturer in Networking
Room 112, Digital Technium
Multidisciplinary Nanotechnology Centre, College of Engineering
Swansea University, Singleton Park, Swansea SA2 8PP, Wales UK
TEL: +44 (0)1792 602024
EMAIL: k.s.kim_at_swansea.ac.uk
HOME: http://iat-hnrl.swan.ac.uk/ (group)
            http://iat-hnrl.swan.ac.uk/~kks/ (personal)



On Sun, Apr 3, 2011 at 2:28 PM, Rajat Banerjee <rbanerj at fas.harvard.edu> wrote:
> Hello list,
> Here is my thesis describing my work on Elastic Load Balancing in
> StarCluster. Many thanks to Justin Riley for his help in getting this done.
>
> The entire PDF is located at:
> http://www.hindoogle.com/thesis/BanerjeeR_Thesis0316.pdf
> It is 71 pages long.
>
> Here is the abstract:
> Abstract
> Computing in the cloud provides companies and colleges a new way to perform
> sophisticated computational tasks. Amazon.com, Inc. (Amazon) is the leading
> provider of cloud infrastructure, and their solutions are used by thousands
> of companies, universities and individuals. Amazon’s service, dubbed Elastic
> Computing Cloud (EC2) allows users to rent servers by the hour, so that
> computing power can be increased and decreased as needed. It eliminates the
> need for companies to build and maintain expensive data centers. Instead
> customers can rent servers to perform tasks as needed, and turn them off
> when the tasks are completed.
> The ability to quickly add and remove computing capacity enables users to
> scale computing capacity in business and academic settings alike. When one
> needs to perform sophisticated calculations, process large data sets, or
> serve many concurrent clients, having more computing power improves
> throughput and responsiveness of the system. Tasks can be completed in less
> time and client requests can be served faster. In a traditional environment
> where a company or university builds and maintains every server in its data
> center, it takes days or even weeks to add new computing capacity, and costs
> a significant amount of money. Amazon EC2 allows for instant addition and
> removal of capacity, and their services are reasonably priced. A new server
> can be available in as little as five minutes and can then be terminated at
> any time. Server usage is billed by the hour, so users pay only for the
> hours they use. This flexibility, coupled with Amazon’s low prices, is a
> boon to anyone who needs to perform complex computational tasks for short or
> unpredictable time periods.
> The need for enormous amounts of computing power for short periods of time
> is a common characteristic of scientists performing High Performance
> Computing (HPC). HPC tasks are crucially important to modern science and can
> range from the modeling of microscopic molecular interactions in a protein
> to a nuclear weapon simulation. Before the availability of cloud computing
> resources, HPC users ran their computational tasks almost exclusively on
> very expensive supercomputers, which can cost in excess of $500 per hour and
> must be reserved ahead of time. These supercomputers are installed at many
> major universities, corporations, and research laboratories, but are not
> easily accessible because of their high cost. The recent installation of
> IBM’s Roadrunner supercomputer at Los Alamos National Laboratories in New
> Mexico cost over $133 million.
> With program decomposition techniques, scientists can break up seemingly
> intractable problems into smaller, more manageable subtasks that run
> independently. The problem can be solved by these extremely powerful
> supercomputers, which distribute the subtasks among the many discrete
> processors within the supercomputer. The processors have speedy
> communication channels between them that offer plenty of bandwidth. When
> discrete subtasks within the larger problem need to share information, such
> as the attractive charges emitted by a molecule in a protein folding
> simulation, that information is sent fast and frequently over the
> inter-processor communication links. Protein Folding simulations are
> particularly well suited toward parallelization because small parts of the
> molecule can be simulated independently, and then the individual results can
> be used to find the ideal structure of the complete protein. Parallelized
> problems like this can be solved by powerful, expensive supercomputers, or
> can be solved in a cluster of computers that are cheaper and more readily
> available. Some problems
> have unique requirements, like continuous single-threaded access to a
> high-powered processor, and those problems are out of the scope of this
> project.
> A project called StarCluster brings the flexibility and low cost of
> clustered, cloud computing to scientists and other users of High Performance
> Computers. Users can launch a cluster of Amazon EC2 servers, also called
> instances, through StarCluster and have a fully configured, ready to use
> computational cluster online in less than ten minutes, for as little as
> $0.08 per instance per hour. No reservations are required and a cluster of
> up to 20 machines can be launched at any time the user desires.
> StarCluster has made high performance computing in the cloud an affordable
> reality to many scientists who do not have access to expensive
> supercomputers. StarCluster, which is free, has approximately 500 users
> worldwide, most of whom are in academia. Using StarCluster incurs no
> additional fees beyond the nominal cost of per hour usage of EC2.
> StarCluster is a superb product for scientists who need supercomputing
> power, and who know how much time and computational resources they need to
> complete the tasks.
> Despite its many strengths, StarCluster does not easily adapt to changing
> workloads. This type of adaptability in the cloud is called elasticity. In
> StarCluster, when a cluster of instances is launched, the scientist must
> specify how many instances he or she wants. Those instances are launched
> together, and can only be terminated together. Instances cannot be
> terminated individually, even if one instance is idle. In some situations it
> is impossible to predict the workload of a cluster, such as when a scientist
> overestimates the duration of a task, or data processing runs faster than
> expected because an unexpected network upgrade transfers files faster. There
> are many reasons that a task could complete faster or slower than expected.
> It is a waste of money, in fees paid to Amazon, and a waste of energy, to
> keep many idle instances running indefinitely.
> This project, Elastic Load Balancing in EC2, aims to address this weakness
> in StarCluster by adding an Elastic Load Balancer to the project. The
> Elastic Load Balancer (ELB) will add instances to the cluster to improve job
> throughput when the cluster is heavily loaded, and terminate instances when
> they are idle to save money and energy. The ELB will periodically poll the
> cluster, analyze its workload, decide if the cluster needs to be modified,
> and add or remove instances. Through this process, StarCluster will maximize
> job throughput at busy times and save money at idle times.
> Several powerful Elastic Load Balancers are commercially available for Cloud
> and EC2 software setups, but StarCluster’s ELB is the only one specifically
> targeted toward the High Performance Computing domain. Existing ELB
> implementations are geared toward web server and application server
> environments and will be discussed in the Prior Work section. HPC jobs have
> a unique computing profile, have long running jobs and seldom serve external
> clients. This HPC computing profile mandates a new Elastic Load Balancing
> strategy.
>
> Any comments or questions are welcome. Best,
> Rajat
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>




More information about the StarCluster mailing list