[StarCluster] MPICH Fabric

Fri May 16 12:45:45 EDT 2014

Hi Rayson

I have written a plugin to run the intel installer from a tgz file on s3. It takes a while but seems to work.

I am using one of the stock public images:
[0] ami-3393a45a us-east-1 starcluster-base-ubuntu-13.04-x86_64 (EBS)

Should I be using the HVM image? I understood they are only needed for GPU computing?

How can I tell if I have the enhanced networking driver setup correctly? In the paravirtual machine lspci etc show nothing?

Thanks for the suggestions!

David Stuebe
Scientist & Software Engineer – RPS ASA

55 Village Square Drive
South Kingstown, RI 02879-8248

Tel: +1 (401) 789-6224
Email: David.Stuebe at rpsgroup.com<mailto:David.Stuebe at rpsgroup.com>
www: asascience.com<http://www.asascience.com/> | rpsgroup.com<http://www.rpsgroup.com/>

A member of the RPS Group plc

From: Rayson Ho <raysonlogin at gmail.com<mailto:raysonlogin at gmail.com>>
Date: Fri, 16 May 2014 09:13:03 -0400
To: David Stuebe <dstuebe at asascience.com<mailto:dstuebe at asascience.com>>
Cc: "starcluster at mit.edu<mailto:starcluster at mit.edu>" <starcluster at mit.edu<mailto:starcluster at mit.edu>>
Subject: Re: [StarCluster] MPICH Fabric

How are you deploying the Intel Cluster Compiler Suite? If you are using a custom AMI, then make sure that you have the AWS enhanced networking NIC driver setup correctly, and also make sure that your instances are all in a placement group, and in a VPC (those should be set by StarCluster if you are using the latest stable version).

We benchmarked AWS enhanced networking on the C3 family a few months ago, and the latency is around 20% better on a pair of C3.8xlarge instances in a placement group with AWS enhanced networking  enabled:

http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html

Rayson

==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html

On Thu, May 15, 2014 at 3:18 PM, David Stuebe <DStuebe at asascience.com<mailto:DStuebe at asascience.com>> wrote:

Hi Starcluster

Do anyone have advice on what fabric to use when running on AWS?

I know the interconnect is supposed to be 10-gigE but my model is more dependent on latency than throughput.

I have had to use the Intel Cluster Compiler Suite rather than the built in OpenMPI. Hoping to resolve those issues and compare the two – I am interested to see the performance differences…

Currently the model actually has a negative performance curve as I add processors past a single node.

Model performance on running on Amazon…
C3.8Xlarge - 1 instance, 32 cores
 !   IINT         SIMTIME(UTC)            FINISH IN       SECS/IT   PERCENT COMPLETE
!8396282   2014-03-18T00:18:02.000000   0000:07:54:19     0.1103  |                    |

C3.8Xlarge - 2 instance, 64 cores
 !8395221   2014-03-18T00:00:21.000000   0000:20:27:58     0.2843  |                    |

C3.8Xlarge - 3 instance, 96 cores
!8395273   2014-03-18T00:01:13.000000   0007:18:33:19     2.5918  |                    |

David Stuebe
Scientist & Software Engineer – RPS ASA

55 Village Square Drive
South Kingstown, RI 02879-8248

Tel: +1 (401) 789-6224<tel:%2B1%20%28401%29%20789-6224>
Email: David.Stuebe at rpsgroup.com<mailto:David.Stuebe at rpsgroup.com>
www: asascience.com<http://www.asascience.com/> | rpsgroup.com<http://www.rpsgroup.com/>

A member of the RPS Group plc

_______________________________________________
StarCluster mailing list
StarCluster at mit.edu<mailto:StarCluster at mit.edu>
http://mailman.mit.edu/mailman/listinfo/starcluster

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140516/136cad9b/attachment.htm