<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Oops, that should read <span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px">Intel Xeon E5-2670 v2 with 10 physical cores (20 threads). The conclusion about oversubscribing the 2 node </span><span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px">c3.4xlarge-based </span><span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px">cluster still stands though.</span></div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px"><br>
</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px">Gonçalo</span></div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, May 11, 2014 at 9:18 PM, Gonçalo Albuquerque <span dir="ltr"><<a href="mailto:albusquercus@gmail.com" target="_blank">albusquercus@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Hi Torstein,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">
<br></div><div class="gmail_default"><font face="arial, helvetica, sans-serif">Here are my 2 cents. To the best of my knowledge, the C3 instances are based on two-socket </font><font color="#333333" face="HelveticaNeue, Helvetica, Helvetica, Arial, sans-serif"><span style="font-size:14px;line-height:22.399999618530273px">Intel Xeon E5-2670 servers. This means 2x8=16 physical cores (2*16 threads with hyper-threading on). Your 2 c3.4xlarge nodes will only have 2*4=8 physical cores. By running a 32 process MPI job on a 2 node c3.4xlarge cluster you're actually oversubscribing the available computational resources, hence you have no more gain in CPU time.</span></font></div>
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px"><br>
</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px">Can you try with c3.8xlarge instances? Two c3.8xlarge nodes will provide you with 32 physical cores.</span></div>
<span class="HOEnZb"><font color="#888888">
<div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px"><br>
</span></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="color:rgb(51,51,51);font-family:HelveticaNeue,Helvetica,Helvetica,Arial,sans-serif;font-size:14px;line-height:22.399999618530273px">Gonçalo</span></div>
</font></span></div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, May 9, 2014 at 7:42 PM, Torstein Fjermestad <span dir="ltr"><<a href="mailto:tfjermestad@gmail.com" target="_blank">tfjermestad@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div><div><div><div>Dear Rayson,<br><br></div>thank you for your fast and informative reply. I have been studying the AWS and the starcluster documentation, and as far as I have understood VPC and the placement group are set up automatically. From the management console I see that all instances are in the same placement group and have the same VPC ID. <br>
<br></div>The instance types I am running are the following:<br></div><br>c3.large for the master node<br></div>c3.4xlarge for the two slave nodes<br><br></div>Today I redid the scaling test, and when adding the two c3.4xlarge nodes, I specified explicitly that they should be based on the HVM-EBS <br>
image (by using the -i option to addnode). I think I forgot to do this yesterday. <br></div>The results for 2, 4, 8, and 16 processors are now much better:<br><br>
        
        
        
        
        
        
<table cols="3" border="0" cellspacing="0">
        <colgroup span="2" width="85"></colgroup>
        <colgroup width="99"></colgroup>
        <tbody><tr>
                <td align="LEFT" height="17"># proc</td>
                <td align="LEFT">CPU time</td>
                <td align="LEFT">wall time</td>
        </tr>
        <tr>
                <td align="LEFT" height="16">2</td>
                <td align="LEFT">7m45.70s</td>
                <td align="LEFT">8m19.11s</td>
        </tr>
        <tr>
                <td align="LEFT" height="16">4</td>
                <td align="LEFT">3m28.29s</td>
                <td align="LEFT">3m22.40s</td>
        </tr>
        <tr>
                <td align="LEFT" height="16">8</td>
                <td align="LEFT">2m22.33s</td>
                <td align="LEFT">2m18.33s</td>
        </tr>
        <tr>
                <td align="LEFT" height="16">16</td>
                <td align="LEFT">1m18.18s</td>
                <td align="LEFT">1m20.59s</td>
        </tr>
        <tr>
                <td align="LEFT" height="16">32</td>
                <td align="LEFT">1m 0.05s</td>
                <td align="LEFT">3m 8.53s</td>
        </tr>
</tbody></table>
<br></div>The exception is the result for 32 processors where again the difference between the wall time and CPU time is large. Does anyone have any suggestions as to what might be causing the bad performance for the calculation on 32 processors?<br>
<br></div><div>Thanks in advance for your help.<br><br></div></div>Regards,<br>Torstein Fjermestad<br> <br></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, May 9, 2014 at 6:30 AM, Rayson Ho <span dir="ltr"><<a href="mailto:raysonlogin@gmail.com" target="_blank">raysonlogin@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">We benchmarked AWS enhanced networking late last year & beginning of this year:<br><div><br><a href="http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html" target="_blank">http://blogs.scalablelogic.com/2013/12/enhanced-networking-in-aws-cloud.html</a><br>
<a href="http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html" target="_blank">http://blogs.scalablelogic.com/2014/01/enhanced-networking-in-aws-cloud-part-2.html</a><br><br></div><div>There are a few things that can affect MPI performance of AWS with enhanced networking:<br>
<br></div><div>1) Make sure that you are using a VPC, because instances in non-VPC default back to standard networking.<br><br>2) Make sure that your instances are all in a AWS Placement Group, or else the latency would be much longer.<br>
<br></div><div>3) Finally, you didn't specify the instance type -- it's important to know what kind of instances you used to perform the test...<br></div><div><div class="gmail_extra"><br clear="all"><div>Rayson<br>
<br>==================================================<br>Open Grid Scheduler - The Official Open Source Grid Engine<br><a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
<a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a></div>
<br><br><div class="gmail_quote"><div><div>On Thu, May 8, 2014 at 1:30 PM, Torstein Fjermestad <span dir="ltr"><<a href="mailto:tfjermestad@gmail.com" target="_blank">tfjermestad@gmail.com</a>></span> wrote:<br>
</div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div>
<div dir="ltr"><div><div><div><div><div><div><div><div><div><div><div><div><div>Dear all,<br><br></div>I am planning to use Star Cluster to run Quantum Espresso (<a href="http://www.quantum-espresso.org/" target="_blank">http://www.quantum-espresso.org/</a>) calculations. For those who are not familiar with Quantum Espresso; it is a code to run quantum mechanical calculations on materials. In order for these types of calculations to achieve good scaling with respect to the number of CPU, fast communication hardware is necessary. <br>
<br></div>For this reason, I configured a cluster based on the HVM-EBS image:<br><br>[1] ami-ca4abfbd eu-west-1 starcluster-base-ubuntu-13.04-x86_64-hvm (HVM-EBS)<br><br></div>Then I followed the instructions on this site <br>
<br><a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#test-enhanced-networking" target="_blank">http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html#test-enhanced-networking</a><br>
<br></div>to check that "enhanced networking" was indeed enabled. Running the suggested commands gave me the same output as in the examples. This certainly indicated that "enhanced networking" is enabled in the image. <br>
<br></div>On this image I installed Quantum Espresso (by use of apt-get install) and I generated a new modified image from which I generated the final cluster. <br><br></div>On this cluster, I carried out some parallelization tests by running the same Quantum Espresso calculation on different number of CPUs. I present the results below:<br>
<br>
        
        
        
        
        
        
<table cols="3" border="0" cellspacing="0">
        <colgroup span="2" width="85"></colgroup>
        <colgroup width="99"></colgroup>
        <tbody><tr>
                <td align="LEFT" height="17"># proc</td>
                <td align="LEFT">CPU time</td>
                <td align="LEFT">wall time</td>
        </tr>
        <tr>
                <td align="RIGHT" height="16">4<br></td>
                <td align="LEFT">4m23.98s</td>
                <td align="LEFT">5m 0.10s</td>
        </tr>
        <tr>
                <td align="RIGHT" height="16">8</td>
                <td align="LEFT">2m46.25s</td>
                <td align="LEFT">2m49.30s</td>
        </tr>
        <tr>
                <td align="RIGHT" height="16">16</td>
                <td align="LEFT">1m40.98s</td>
                <td align="LEFT">4m 2.82s</td>
        </tr>
        <tr>
                <td align="RIGHT" height="16">32</td>
                <td align="LEFT">0m57.70s</td>
                <td align="LEFT">3m36.15s</td>
        </tr>
</tbody></table>
<br></div>Except from the test ran with 8 CPUs, the wall time is significantly longer than the CPU time. This is usually an indication of a slow communication between the CPUs/nodes.<br><br></div>My question is therefore whether there is a way to check the communication speed between the nodes / CPUs.<br>
<br></div>The large difference between the CPU time and wall time may also be caused by an incorrect configuration of the cluster. Is there something I have done wrong / forgotten? <br><br></div>Does anyone have suggestions on how I can fix this parallelization issue?<br>
<br></div>Thanks in advance for your help.<br><br></div>Regards,<br></div>Torstein Fjermestad<br><br><div><div><div><div><div><div><div><div><br><br></div></div></div></div></div></div></div></div></div>
<br></div></div>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div><br></div></div></div>
</blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div><br></div>
</div></div></blockquote></div><br></div>