[StarCluster] compiling MPI applications on starcluster

Torstein Fjermestad tfjermestad at gmail.com
Mon Apr 28 07:09:24 EDT 2014


Dear Justin,

during the compilation, the cluster only consisted of the master node which
is of instance type c3.large. In order to run a test parallel calculation,
I added a node of instance type c3.4xlarge (16 processors).

The cluster is created form the following AMI:
[0] ami-044abf73 eu-west-1 starcluster-base-ubuntu-13.04-x86_64 (EBS)

Executing the application outside the queuing system like

mpirun -np 2 -hostfile hosts ./pw.x -in inputfile.inp

did not change anything.

The output of the command "mpirun --version" is the following:

mpirun (Open MPI) 1.4.5

Report bugs to http://www.open-mpi.org/community/help/

After investigating the matter a little bit, I found that mpif90 is likely
compiled with an MPI version different from mpirun.
The first line of the output of the command "mpif90 -v" is the following:

mpif90 for MPICH2 version 1.4.1

Furthermore, the output of the command "ldd pw.x" indicates that pw.x is
compiled with mpich2 and not with Open MPI. The output is the following:

linux-vdso.so.1 =>  (0x00007fffd35fe000)
    liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007ff38fb18000)
    libopenblas.so.0 => /usr/lib/libopenblas.so.0 (0x00007ff38e2f5000)
    *libmpich.so.3 *=> /usr/lib/libmpich.so.3 (0x00007ff38df16000)
    libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
(0x00007ff38dcf9000)
    libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3
(0x00007ff38d9e5000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ff38d6df000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
(0x00007ff38d4c9000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff38d100000)
    librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007ff38cef7000)
    libcr.so.0 => /usr/lib/libcr.so.0 (0x00007ff38cced000)
    libmpl.so.1 => /usr/lib/libmpl.so.1 (0x00007ff38cae8000)
    /lib64/ld-linux-x86-64.so.2 (0x00007ff390820000)
    libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0
(0x00007ff38c8b2000)
    libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ff38c6ae000)

The feedback I got from the Quantum Espresso mailing list suggested that
the cause of the error could be that pw.x (the executable) was not compiled
with the same version of mpi as mpirun.
The output of the commands "mpirun --version", "mpif90 -v" and "ldd pw.x"
above have lead me to suspect that this is indeed the case.

I therefore wonder whether it is possible to control which mpi version I
compile my applications with.

If, with the current mpi installation, the applications are compiled with a
different mpi version than mpirun, then I will likely have similar problems
when compiling other applications as well. I would therefore very much
appreciate if you could give me some hints on how I can solve this problem.

Thanks in advance.

Regards,
Torstein






On Thu, Apr 24, 2014 at 5:13 PM, Justin Riley <jtriley at mit.edu> wrote:

> Hi Torstein,
>
> Can you please describe your cluster configuration (ie size, image id(s),
> instance type(s))? Also, you're currently using the SGE/OpenMPI
> integration. Have you tried just using mpirun only as described in the
> first part of:
>
>
> http://star.mit.edu/cluster/docs/latest/guides/sge.html#submitting-openmpi-jobs-using-a-parallel-environment
>
> Also, what does 'mpirun --version' show?
>
> ~Justin
>
> On Thu, Apr 17, 2014 at 07:19:28PM +0200, Torstein Fjermestad wrote:
> >    Dear all,
> >
> >    I recently tried to compile an application (Quantum Espresso,
> >    [1]http://www.quantum-espresso.org/) to be used for parallel
> computations
> >    on StarCluster. The installation procedure of the application
> consists of
> >    the standard "./configure + make" steps.  At the end of the output
> from
> >    ./configure, the statement "Parallel environment detected
> successfully.\
> >    Configured for compilation of parallel executables." appears.
> >
> >    The compilation with "make" completes without errors. I then run the
> >    application in the following way:
> >
> >    I first write a submit script (submit.sh) with the following content:
> >
> >    cp /path/to/executable/pw.x .
> >    mpirun ./pw.x -in input.inp
> >    I then submit the job to the queueing system with the following
> command
> >
> >    qsub -cwd -pe orte 16 ./submit.sh
> >
> >    However, in the output of the calculation, the following line is
> repeated
> >    16 times:
> >
> >    Parallel version (MPI), running on 1 processors
> >
> >    It therefore seems like the program runs 16 1 processor calculations
> that
> >    all write to the same output.
> >
> >    I wrote about this problem to the mailing list of Quantum Espresso,
> and I
> >    got the suggestion that perhaps the mpirun belonged to a different MPI
> >    library than pw.x (a particular package of Quantum Espresso) was
> compiled
> >    with.
> >
> >    I compiled pw.x on the same cluster as I executed mpirun. Are there
> >    several versions of openMPI on the AMIs provided by StarCluster? In
> that
> >    case, how can I choose the correct one.
> >
> >    Perhaps the problem has a different cause. Does anyone have
> suggestions on
> >    how to solve it?
> >
> >    Thanks in advance for your help.
> >
> >    Yours sincerely,
> >    Torstein Fjermestad
> >
> > References
> >
> >    Visible links
> >    1. http://www.quantum-espresso.org/
>
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140428/903497ee/attachment.htm


More information about the StarCluster mailing list