<div dir="ltr">Hi,<br><br>I am running a python program for distributed implementation of<br>gradient descent. The gradient descent step involves an Allreduce<br>function to obtain the overall gradient at the workers. I have been<br>setting up clusters earlier without Starcluster, but recently I needed<br>to use large clusters and had to move to Starcluster. I was surprised<br>to see that the MPI.Allreduce operation is much faster in a cluster<br>generated by Starcluster than in a cluster set up by using traditional<br>methods. I am curious to know if this is an artifact, or Starcluster<br>optimizes the communication network somehow to enable efficient<br>Allreduce step. I am attaching the code for a cluster of 43 nodes (1<br>master and 42 workers), though I have replaced the data loading with<br>random data initialization to mimic the gradient step. Any insight<br>regarding this would be extremely helpful.<br><br>Thanks,<br>Saurav.<div class="gmail-yj6qo"></div><br class="gmail-Apple-interchange-newline"></div>