[StarCluster] using StarCluster with R and Rmpi
Dan Tenenbaum
dandante at dandante.com
Tue Oct 26 15:25:43 EDT 2010
Hi,
I am using StarCluster and it works great.
I am having a problem when I try and use it with the R language,
specifically the Rmpi package (which is R's wrapper around MPI).
I am guessing this might have something to do with the way I compiled
Rmpi. I need to tell it where the MPI headers can be found on the
system.
I'm on a 64-bit system (using the 64-bit AMI and the m1.large type).
I've tried two different ways of compiling Rmpi:
way 1:
R CMD INSTALL Rmpi_0.5-8.tar.gz --configure-args=--with-mpi=/usr/lib64/openmpi
way 2:
wget http://cran.fhcrc.org/src/contrib/Rmpi_0.5-8.tar.gz
R CMD INSTALL Rmpi_0.5-8.tar.gz --configure-args=--with-mpi=/usr/lib/openmpi
Neither of them work.
Specifically, when I run some code that does nothing but load the Rmpi
library ("library(Rmpi)"), I get this:
sgeadmin at ip-10-112-70-93:~$ mpirun -n 4 --hostfile hostfile R CMD BATCH load.R
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 7878 on
node ip-10-112-70-93 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
If I run some code that actually tries to do something with Rmpi (code
below), I get this:
sgeadmin at ip-10-112-70-93:~$ mpirun -n 4 --hostfile hostfile R CMD BATCH doit.R
[ip-10-112-70-93:07913] [[22941,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../../../../orte/mca/rml/oob/rml_oob_send.c at
line 105
[ip-10-112-70-93:07913] [[22941,0],0] could not get route to [[22941,0],2]
[ip-10-112-70-93:07913] [[22941,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../orte/orted/orted_comm.c at line 130
[ip-10-112-70-93:07913] [[22941,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../../../../orte/mca/rml/oob/rml_oob_send.c at
line 105
[ip-10-112-70-93:07913] [[22941,0],0] could not get route to [[22941,0],2]
[ip-10-112-70-93:07913] [[22941,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../orte/orted/orted_comm.c at line 130
[ip-10-112-70-93:07913] [[22941,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../../../../orte/mca/rml/oob/rml_oob_send.c at
line 105
[ip-10-112-70-93:07913] [[22941,0],0] could not get route to [[22941,0],2]
[ip-10-112-70-93:07913] [[22941,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../orte/orted/orted_comm.c at line 130
[ip-10-112-70-93:07913] [[22941,0],0]:route_callback tried routing
message from [[22941,0],1] to [[22941,0],3]:1, can't find route
[0] func:/usr/lib/libopen-pal.so.0(opal_backtrace_print+0x1f) [0x7f567c5b722f]
[1] func:/usr/lib/openmpi/lib/openmpi/mca_rml_oob.so [0x7f567ab36224]
[2] func:/usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so [0x7f567a92c2a6]
[3] func:/usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so [0x7f567a92d67a]
[4] func:/usr/lib/libopen-pal.so.0 [0x7f567c5a2a98]
[5] func:/usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f567c597099]
[6] func:/usr/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x1b5)
[0x7f567c828585]
[7] func:/usr/lib/openmpi/lib/openmpi/mca_plm_rsh.so [0x7f567ad3e9f9]
[8] func:/usr/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x41b)
[0x7f567c826b0b]
[9] func:/usr/lib/libopen-pal.so.0 [0x7f567c5a2a98]
[10] func:/usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f567c597099]
[11] func:/usr/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x1b5)
[0x7f567c828585]
[12] func:/usr/lib/openmpi/lib/openmpi/mca_plm_rsh.so [0x7f567ad3e9f9]
[13] func:/usr/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x41b)
[0x7f567c826b0b]
[14] func:/usr/lib/libopen-pal.so.0 [0x7f567c5a2a98]
[15] func:/usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f567c597099]
[16] func:/usr/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x1b5)
[0x7f567c828585]
[17] func:/usr/lib/openmpi/lib/openmpi/mca_plm_rsh.so [0x7f567ad3e9f9]
[18] func:/usr/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x41b)
[0x7f567c826b0b]
[19] func:/usr/lib/libopen-pal.so.0 [0x7f567c5a2a98]
[20] func:/usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f567c597099]
[21] func:/usr/lib/libopen-rte.so.0(orte_plm_base_daemon_callback+0xa5)
[0x7f567c828045]
[22] func:/usr/lib/openmpi/lib/openmpi/mca_plm_rsh.so [0x7f567ad3e9da]
[23] func:/usr/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x41b)
[0x7f567c826b0b]
[24] func:/usr/lib/libopen-pal.so.0 [0x7f567c5a2a98]
[25] func:mpirun [0x4039b1]
[26] func:mpirun [0x402e24]
[27] func:/lib/libc.so.6(__libc_start_main+0xfd) [0x7f567b771abd]
[28] func:mpirun [0x402d49]
and I get a bunch of empty ip*.log files. (Earlier I was getting log
files that said Warning: Rmpi cannot be loaded).
This is with Rmpi compiled against /usr/lib64/openmpi.
If I do it with Rmpi compiled against /usr/lib/openmpi, I get more or
less the same error.
sgeadmin at ip-10-112-70-93:~$ mpirun -n 4 --hostfile hostfile R CMD BATCH doit.R
[ip-10-112-70-93:08648] [[26300,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../../../../orte/mca/rml/oob/rml_oob_send.c at
line 105
[ip-10-112-70-93:08648] [[26300,0],0] could not get route to [[26300,0],2]
[ip-10-112-70-93:08648] [[26300,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../orte/orted/orted_comm.c at line 130
[ip-10-112-70-93:08648] [[26300,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../../../../orte/mca/rml/oob/rml_oob_send.c at
line 105
[ip-10-112-70-93:08648] [[26300,0],0] could not get route to [[26300,0],2]
[ip-10-112-70-93:08648] [[26300,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../orte/orted/orted_comm.c at line 130
[ip-10-112-70-93:08648] [[26300,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../../../../orte/mca/rml/oob/rml_oob_send.c at
line 105
[ip-10-112-70-93:08648] [[26300,0],0] could not get route to [[26300,0],2]
[ip-10-112-70-93:08648] [[26300,0],0] ORTE_ERROR_LOG: A message is
attempting to be sent to a process whose contact information is
unknown in file ../../../orte/orted/orted_comm.c at line 130
[ip-10-112-70-93:08648] [[26300,0],0]:route_callback tried routing
message from [[26300,0],1] to [[26300,0],3]:1, can't find route
[0] func:/usr/lib/libopen-pal.so.0(opal_backtrace_print+0x1f) [0x7f44356b322f]
[1] func:/usr/lib/openmpi/lib/openmpi/mca_rml_oob.so [0x7f4433c32224]
[2] func:/usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so [0x7f4433a282a6]
[3] func:/usr/lib/openmpi/lib/openmpi/mca_oob_tcp.so [0x7f4433a2967a]
[4] func:/usr/lib/libopen-pal.so.0 [0x7f443569ea98]
[5] func:/usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f4435693099]
[6] func:/usr/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x1b5)
[0x7f4435924585]
[7] func:/usr/lib/openmpi/lib/openmpi/mca_plm_rsh.so [0x7f4433e3a9f9]
[8] func:/usr/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x41b)
[0x7f4435922b0b]
[9] func:/usr/lib/libopen-pal.so.0 [0x7f443569ea98]
[10] func:/usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f4435693099]
[11] func:/usr/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x1b5)
[0x7f4435924585]
[12] func:/usr/lib/openmpi/lib/openmpi/mca_plm_rsh.so [0x7f4433e3a9f9]
[13] func:/usr/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x41b)
[0x7f4435922b0b]
[14] func:/usr/lib/libopen-pal.so.0 [0x7f443569ea98]
[15] func:/usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f4435693099]
[16] func:/usr/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x1b5)
[0x7f4435924585]
[17] func:/usr/lib/openmpi/lib/openmpi/mca_plm_rsh.so [0x7f4433e3a9f9]
[18] func:/usr/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x41b)
[0x7f4435922b0b]
[19] func:/usr/lib/libopen-pal.so.0 [0x7f443569ea98]
[20] func:/usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f4435693099]
[21] func:/usr/lib/libopen-rte.so.0(orte_plm_base_daemon_callback+0xa5)
[0x7f4435924045]
[22] func:/usr/lib/openmpi/lib/openmpi/mca_plm_rsh.so [0x7f4433e3a9da]
[23] func:/usr/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x41b)
[0x7f4435922b0b]
[24] func:/usr/lib/libopen-pal.so.0 [0x7f443569ea98]
[25] func:mpirun [0x4039b1]
[26] func:mpirun [0x402e24]
[27] func:/lib/libc.so.6(__libc_start_main+0xfd) [0x7f443486dabd]
[28] func:mpirun [0x402d49]
Here's the code in doit.R:
library(ShortRead)
library(Rmpi)
exptPath <- system.file("extdata", package = "ShortRead")
sp <- SolexaPath(exptPath)
mpi.spawn.Rslaves(nsl = 8)
qaSummary <- qa(sp)
mpi.close.Rslaves()
report(qaSummary, dest="report")
I think this has to do with how I compiled Rmpi. Any tips? How did you
guys compile PyMpi or whatever it is that Python uses?
Thanks
Dan
More information about the StarCluster
mailing list