[StarCluster] Parallelization of MPI application with Star Cluster

Rayson Ho raysonlogin at gmail.com
Fri May 9 00:30:27 EDT 2014


We benchmarked AWS enhanced networking late last year & beginning of this
year:

http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html
http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

There are a few things that can affect MPI performance of AWS with enhanced
networking:

1) Make sure that you are using a VPC, because instances in non-VPC default
back to standard networking.

2) Make sure that your instances are all in a AWS Placement Group, or else
the latency would be much longer.

3) Finally, you didn't specify the instance type -- it's important to know
what kind of instances you used to perform the test...

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html


On Thu, May 8, 2014 at 1:30 PM, Torstein Fjermestad
<tfjermestad at gmail.com>wrote:

> Dear all,
>
> I am planning to use Star Cluster to run Quantum Espresso (
> http://www.quantum-espresso.org/) calculations. For those who are not
> familiar with Quantum Espresso; it is a code to run quantum mechanical
> calculations on materials. In order for these types of calculations to
> achieve good scaling with respect to the number of CPU, fast communication
> hardware is necessary.
>
> For this reason, I configured a cluster based on the HVM-EBS image:
>
> [1] ami-ca4abfbd eu-west-1 starcluster-base-ubuntu-13.04-x86_64-hvm
> (HVM-EBS)
>
> Then I followed the instructions on this site
>
>
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#test-enhanced-networking
>
> to check that "enhanced networking" was indeed enabled. Running the
> suggested commands gave me the same output as in the examples.  This
> certainly indicated that "enhanced networking" is enabled in the image.
>
> On this image I installed Quantum Espresso (by use of apt-get install) and
> I generated a new modified image from which I generated the final cluster.
>
> On this cluster, I carried out some parallelization tests by running the
> same Quantum Espresso calculation on different number of CPUs. I present
> the results below:
>
>   # proc CPU time wall time  4
> 4m23.98s 5m 0.10s  8 2m46.25s 2m49.30s  16 1m40.98s 4m 2.82s  32 0m57.70s
> 3m36.15s
> Except from the test ran with 8 CPUs, the wall time is significantly
> longer than the CPU time. This is usually an indication of a slow
> communication between the CPUs/nodes.
>
> My question is therefore whether there is a way to check the communication
> speed between the nodes / CPUs.
>
> The large difference between the CPU time and wall time may also be caused
> by an incorrect configuration of the cluster. Is there something I have
> done wrong / forgotten?
>
> Does anyone have suggestions on how I can fix this parallelization issue?
>
> Thanks in advance for your help.
>
> Regards,
> Torstein Fjermestad
>
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140509/10cf12df/attachment-0001.htm


More information about the StarCluster mailing list