[StarCluster] Integration of MPICH2 plugin with SGE

Hyokun Yun yun3 at purdue.edu
Mon Aug 19 00:53:07 EDT 2013


Dear starcluster users,


I am experiencing a problem using MPICH2 plugin with SGE.

I am using the following image: ami-52a0c53b which uses Ubuntu 12.04

When I use mpich2 plugin, it seems like mpich2 and SGE are not tightly
integrated: when I execute my script using qsub, I get the following error
message.

error: executing task of job 1 failed: execution daemon on host "node001"
didn't accept task
error: executing task of job 1 failed: execution daemon on host "node002"
didn't accept task
error: executing task of job 1 failed: execution daemon on host "node003"
didn't accept task
error: executing task of job 1 failed: execution daemon on host "nodef004"
didn't accept task

It runs fine when I simply execute 'mpirun' myself, instead of relying on
SGE.
Also, the same script runs fine as well when I use OpenMPI instead of
MPICH2.  That's why I suspect it is MPICH2 & SGE integration issue.

The problem is that I need multi-thread support, and it is by default
disabled in OpenMPI.  I also prefer to use MPICH2 instead of OpenMPI.

I was able to reproduce the problem when I restarted the cluster from
scratch.  Would any of you please take a look on the problem by trying the
same image with MPICH2 plugin?


Thanks,
Hyokun Yun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20130818/dcd49cbb/attachment.htm


More information about the StarCluster mailing list