[StarCluster] IPython parallel DirectView not distributing to starcluster nodes.

Austin So austin.so at tomabio.com
Wed Jul 1 19:40:12 EDT 2015


Thanks.

Yes, I started checking before and after line execution, but nothing different. So a relaunch:

>>> IPCluster has been started on SecurityGroup:@sc-toma for user 'sgeadmin'
with 159 engines on 5 nodes.

In IP[y]: Notebook

Through IP[y]: Notebook, I’m assigning 150 engines.

So using the following test code:

from IPython.parallel import Client
ipclient = Client(packer = 'pickle')
dview = ipclient[:]
lview = ipclient.load_balanced_view()

Here I am getting an array of 150 elements as expected when using len(ipclient.ids)

Now executing code:

%%px --local
import socket
import pandas as pd
def distribute(a):
	return socket.getfqdn()

a = pd.DataFrame(range(0,10000))

version 1a:
dview.map(distribute, a[0]).get()

version 1b:
lview.map(distribute, a[0]).get()

--This results in an output of ‘master’ in each element.

version 2:
dview.scatter('a', a)
dview.execute('b = distribute(a[0])', block=True)
dview.gather('b', block=True)

--This also results in an output of ‘master’ in each element.

verifying with len(ipclient.ids) confirms that I have all the engines in place.

A.





> On Jul 1, 2015, at 11:14 AM, MinRK <benjaminrk at gmail.com> wrote:
> 
> Can you perhaps share a code sample? Have you verified that all the engines are registered with the Client (`Client.ids`) before submitting the tasks?
> 
> -MinRK
> 
> On Wed, Jul 1, 2015 at 9:09 AM, Austin So <austin.so at tomabio.com <mailto:austin.so at tomabio.com>> wrote:
> I’ve been trying to figure out what I’m doing wrong here, and if it is an issue within the starcluster config file. I’ve exhausted all possible implementations in my code that I could think of.
> 
> During set up, 255 engines have been recorded to have been set-up by starcluster upon launch that are available to IPcluster.
> 
> Within IPython Notebooks, I’m trying to distribute a function across all my nodes and engines (5 at r3.8xlarge).
> 
> So when the line of code is running, I’m looking at qhost, and I see that only the Master is showing a CPU load. I look at the Cloud Metrics, and I see that only the Master is showing a CPU load. At the suggestion of a friend, I returned a socket.fqdn() call to identify if the results were processed by the master or one of the nodes. All results returned were generated by the Master.
> 
> Any hints to identify where the source of the problem lies would be greatly appreciated.
> 
> Best
> 
> Austin
> 
> 
> 
> 
> 
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu <mailto:StarCluster at mit.edu>
> http://mailman.mit.edu/mailman/listinfo/starcluster <http://mailman.mit.edu/mailman/listinfo/starcluster>
> 
> 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20150701/da155b13/attachment-0001.htm


More information about the StarCluster mailing list