[StarCluster] compiling MPI applications on starcluster
Torstein Fjermestad
tfjermestad at gmail.com
Tue Apr 29 09:16:40 EDT 2014
Dear Justin and Gonçalo,
thank you very much. It worked!
I first ran the two suggested commands:
$ update-alternatives --config mpi
$ update-alternatives --config mpirun
and then I reconfigured and recompiled the application.
Regards,
Torstein
On Mon, Apr 28, 2014 at 4:43 PM, Justin Riley <jtriley at mit.edu> wrote:
> Gonçalo,
>
> Ah, I thought this sounded familiar:
>
> https://github.com/jtriley/StarCluster/issues/370
>
> Thanks for responding. This will be fixed in the upcoming 14.04 AMIs.
>
> Torstein, you can update the MPI links interactively by running the
> following commands as root:
>
> $ update-alternatives --config mpi
> $ update-alternatives --config mpirun
>
> Select either all openmpi or all mpich paths at the interactive prompts.
>
> ~Justin
>
> On Mon, Apr 28, 2014 at 02:06:50PM +0200, Gonçalo Albuquerque wrote:
> > Hi,
> > When using AMI ami-6b211202 in us-east I stumbled across the same
> issue
> > you're experiencing.
> > The symbolic links in the alternatives system are mixing MPICH and
> > OpenMPI:
> > root at master:/etc/alternatives# update-alternatives --display mpi
> > mpi - auto mode
> > link currently points to /usr/include/mpich2
> > /usr/include/mpich2 - priority 40
> > slave libmpi++.so: /usr/lib/libmpichcxx.so
> > slave libmpi.so: /usr/lib/libmpich.so
> > slave libmpif77.so: /usr/lib/libfmpich.so
> > slave libmpif90.so: /usr/lib/libmpichf90.so
> > slave mpic++: /usr/bin/mpic++.mpich2
> > slave mpic++.1.gz: /usr/share/man/man1/mpic++.mpich2.1.gz
> > slave mpicc: /usr/bin/mpicc.mpich2
> > slave mpicc.1.gz: /usr/share/man/man1/mpicc.mpich2.1.gz
> > slave mpicxx: /usr/bin/mpicxx.mpich2
> > slave mpicxx.1.gz: /usr/share/man/man1/mpicxx.mpich2.1.gz
> > slave mpif77: /usr/bin/mpif77.mpich2
> > slave mpif77.1.gz: /usr/share/man/man1/mpif77.mpich2.1.gz
> > slave mpif90: /usr/bin/mpif90.mpich2
> > slave mpif90.1.gz: /usr/share/man/man1/mpif90.mpich2.1.gz
> > /usr/lib/openmpi/include - priority 40
> > slave libmpi++.so: /usr/lib/openmpi/lib/libmpi_cxx.so
> > slave libmpi.so: /usr/lib/openmpi/lib/libmpi.so
> > slave libmpif77.so: /usr/lib/openmpi/lib/libmpi_f77.so
> > slave libmpif90.so: /usr/lib/openmpi/lib/libmpi_f90.so
> > slave mpiCC: /usr/bin/mpic++.openmpi
> > slave mpiCC.1.gz: /usr/share/man/man1/mpiCC.openmpi.1.gz
> > slave mpic++: /usr/bin/mpic++.openmpi
> > slave mpic++.1.gz: /usr/share/man/man1/mpic++.openmpi.1.gz
> > slave mpicc: /usr/bin/mpicc.openmpi
> > slave mpicc.1.gz: /usr/share/man/man1/mpicc.openmpi.1.gz
> > slave mpicxx: /usr/bin/mpic++.openmpi
> > slave mpicxx.1.gz: /usr/share/man/man1/mpicxx.openmpi.1.gz
> > slave mpif77: /usr/bin/mpif77.openmpi
> > slave mpif77.1.gz: /usr/share/man/man1/mpif77.openmpi.1.gz
> > slave mpif90: /usr/bin/mpif90.openmpi
> > slave mpif90.1.gz: /usr/share/man/man1/mpif90.openmpi.1.gz
> > Current 'best' version is '/usr/include/mpich2'.
> > root at master:/etc/alternatives# update-alternatives --display mpirun
> > mpirun - auto mode
> > link currently points to /usr/bin/mpirun.openmpi
> > /usr/bin/mpirun.mpich2 - priority 40
> > slave mpiexec: /usr/bin/mpiexec.mpich2
> > slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.mpich2.1.gz
> > slave mpirun.1.gz: /usr/share/man/man1/mpirun.mpich2.1.gz
> > /usr/bin/mpirun.openmpi - priority 50
> > slave mpiexec: /usr/bin/mpiexec.openmpi
> > slave mpiexec.1.gz: /usr/share/man/man1/mpiexec.openmpi.1.gz
> > slave mpirun.1.gz: /usr/share/man/man1/mpirun.openmpi.1.gz
> > Current 'best' version is '/usr/bin/mpirun.openmpi'.
> > You do compile it with MPICH and try to run with OpenMPI. The
> solution is
> > to change the symbolic links by using the update-alternatives
> command. For
> > the runtime link (mpirun), it must be done in all the nodes of the
> > cluster.
> > No doubt this will be corrected in upcoming versions of the AMIs.
> > Regards,
> > Gonçalo
> >
> > On Mon, Apr 28, 2014 at 1:09 PM, Torstein Fjermestad
> > <[1]tfjermestad at gmail.com> wrote:
> >
> > Dear Justin,
> >
> > during the compilation, the cluster only consisted of the master
> node
> > which is of instance type c3.large. In order to run a test parallel
> > calculation, I added a node of instance type c3.4xlarge (16
> processors).
> >
> > The cluster is created form the following AMI:
> > [0] ami-044abf73 eu-west-1 starcluster-base-ubuntu-13.04-x86_64
> (EBS)
> >
> > Executing the application outside the queuing system like
> >
> > mpirun -np 2 -hostfile hosts ./pw.x -in inputfile.inp
> >
> > did not change anything.
> >
> > The output of the command "mpirun --version" is the following:
> >
> > mpirun (Open MPI) 1.4.5
> >
> > Report bugs to [2]http://www.open-mpi.org/community/help/
> >
> > After investigating the matter a little bit, I found that mpif90 is
> > likely compiled with an MPI version different from mpirun.
> > The first line of the output of the command "mpif90 -v" is the
> > following:
> >
> > mpif90 for MPICH2 version 1.4.1
> >
> > Furthermore, the output of the command "ldd pw.x" indicates that
> pw.x is
> > compiled with mpich2 and not with Open MPI. The output is the
> following:
> >
> > linux-vdso.so.1 => (0x00007fffd35fe000)
> > liblapack.so.3 => /usr/lib/liblapack.so.3 (0x00007ff38fb18000)
> > libopenblas.so.0 => /usr/lib/libopenblas.so.0
> (0x00007ff38e2f5000)
> > libmpich.so.3 => /usr/lib/libmpich.so.3 (0x00007ff38df16000)
> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0
> > (0x00007ff38dcf9000)
> > libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3
> > (0x00007ff38d9e5000)
> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6
> (0x00007ff38d6df000)
> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1
> > (0x00007ff38d4c9000)
> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
> (0x00007ff38d100000)
> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1
> (0x00007ff38cef7000)
> > libcr.so.0 => /usr/lib/libcr.so.0 (0x00007ff38cced000)
> > libmpl.so.1 => /usr/lib/libmpl.so.1 (0x00007ff38cae8000)
> > /lib64/ld-linux-x86-64.so.2 (0x00007ff390820000)
> > libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0
> > (0x00007ff38c8b2000)
> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2
> (0x00007ff38c6ae000)
> >
> > The feedback I got from the Quantum Espresso mailing list suggested
> that
> > the cause of the error could be that pw.x (the executable) was not
> > compiled with the same version of mpi as mpirun.
> > The output of the commands "mpirun --version", "mpif90 -v" and "ldd
> > pw.x" above have lead me to suspect that this is indeed the case.
> > I therefore wonder whether it is possible to control which mpi
> version I
> > compile my applications with.
> >
> > If, with the current mpi installation, the applications are compiled
> > with a different mpi version than mpirun, then I will likely have
> > similar problems when compiling other applications as well. I would
> > therefore very much appreciate if you could give me some hints on
> how I
> > can solve this problem.
> >
> > Thanks in advance.
> >
> > Regards,
> > Torstein
> >
> > On Thu, Apr 24, 2014 at 5:13 PM, Justin Riley <[3]jtriley at mit.edu>
> > wrote:
> >
> > Hi Torstein,
> >
> > Can you please describe your cluster configuration (ie size, image
> > id(s),
> > instance type(s))? Also, you're currently using the SGE/OpenMPI
> > integration. Have you tried just using mpirun only as described
> in the
> > first part of:
> >
> > [4]
> http://star.mit.edu/cluster/docs/latest/guides/sge.html#submitting-openmpi-jobs-using-a-parallel-environment
> >
> > Also, what does 'mpirun --version' show?
> >
> > ~Justin
> > On Thu, Apr 17, 2014 at 07:19:28PM +0200, Torstein Fjermestad
> wrote:
> > > Dear all,
> > >
> > > I recently tried to compile an application (Quantum Espresso,
> > > [1][5]http://www.quantum-espresso.org/) to be used for
> parallel
> > computations
> > > on StarCluster. The installation procedure of the application
> > consists of
> > > the standard "./configure + make" steps. At the end of the
> > output from
> > > ./configure, the statement "Parallel environment detected
> > successfully.\
> > > Configured for compilation of parallel executables." appears.
> > >
> > > The compilation with "make" completes without errors. I then
> run
> > the
> > > application in the following way:
> > >
> > > I first write a submit script (submit.sh) with the following
> > content:
> > >
> > > cp /path/to/executable/pw.x .
> > > mpirun ./pw.x -in input.inp
> > > I then submit the job to the queueing system with the
> following
> > command
> > >
> > > qsub -cwd -pe orte 16 ./submit.sh
> > >
> > > However, in the output of the calculation, the following
> line is
> > repeated
> > > 16 times:
> > >
> > > Parallel version (MPI), running on 1 processors
> > >
> > > It therefore seems like the program runs 16 1 processor
> > calculations that
> > > all write to the same output.
> > >
> > > I wrote about this problem to the mailing list of Quantum
> > Espresso, and I
> > > got the suggestion that perhaps the mpirun belonged to a
> > different MPI
> > > library than pw.x (a particular package of Quantum Espresso)
> was
> > compiled
> > > with.
> > >
> > > I compiled pw.x on the same cluster as I executed mpirun. Are
> > there
> > > several versions of openMPI on the AMIs provided by
> StarCluster?
> > In that
> > > case, how can I choose the correct one.
> > >
> > > Perhaps the problem has a different cause. Does anyone have
> > suggestions on
> > > how to solve it?
> > >
> > > Thanks in advance for your help.
> > >
> > > Yours sincerely,
> > > Torstein Fjermestad
> > >
> > > References
> > >
> > > Visible links
> > > 1. [6]http://www.quantum-espresso.org/
> >
> > > _______________________________________________
> > > StarCluster mailing list
> > > [7]StarCluster at mit.edu
> > > [8]http://mailman.mit.edu/mailman/listinfo/starcluster
> >
> > _______________________________________________
> > StarCluster mailing list
> > [9]StarCluster at mit.edu
> > [10]http://mailman.mit.edu/mailman/listinfo/starcluster
> >
> > References
> >
> > Visible links
> > 1. mailto:tfjermestad at gmail.com
> > 2. http://www.open-mpi.org/community/help/
> > 3. mailto:jtriley at mit.edu
> > 4.
> http://star.mit.edu/cluster/docs/latest/guides/sge.html#submitting-openmpi-jobs-using-a-parallel-environment
> > 5. http://www.quantum-espresso.org/
> > 6. http://www.quantum-espresso.org/
> > 7. mailto:StarCluster at mit.edu
> > 8. http://mailman.mit.edu/mailman/listinfo/starcluster
> > 9. mailto:StarCluster at mit.edu
> > 10. http://mailman.mit.edu/mailman/listinfo/starcluster
>
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140429/9d8a1f52/attachment-0001.htm
More information about the StarCluster
mailing list