<div dir="ltr"><div>Dear starcluster users,</div><div><br></div><div><br></div><div>I am experiencing a problem using MPICH2 plugin with SGE.</div><div><br></div><div>I am using the following image: ami-52a0c53b which uses Ubuntu 12.04</div>
<div><br></div><div>When I use mpich2 plugin, it seems like mpich2 and SGE are not tightly integrated: when I execute my script using qsub, I get the following error message.</div><div><br></div><div>error: executing task of job 1 failed: execution daemon on host "node001" didn't accept task</div>
<div>error: executing task of job 1 failed: execution daemon on host "node002" didn't accept task</div><div>error: executing task of job 1 failed: execution daemon on host "node003" didn't accept task</div>
<div>error: executing task of job 1 failed: execution daemon on host "nodef004" didn't accept task</div><div><br></div><div>It runs fine when I simply execute 'mpirun' myself, instead of relying on SGE.</div>
<div>Also, the same script runs fine as well when I use OpenMPI instead of MPICH2. That's why I suspect it is MPICH2 & SGE integration issue.</div><div><br></div><div>The problem is that I need multi-thread support, and it is by default disabled in OpenMPI. I also prefer to use MPICH2 instead of OpenMPI.</div>
<div><br></div><div>I was able to reproduce the problem when I restarted the cluster from scratch. Would any of you please take a look on the problem by trying the same image with MPICH2 plugin?</div><div><br></div><div>
<br></div><div>Thanks,</div><div>Hyokun Yun</div>
</div>