MPI cluster based parallel calculation in R on WestGrid (pbs file)

狂风中的少年 提交于 2019-12-23 05:32:26

问题


I am now dealing with a large dataset and I want to use parallel calculation to accelerate the process. WestGird is a Canadian computing system which has clusters with interconnect.

I use two packages doSNOW and parallel to do parallel jobs. My question is how I should write the pbs file. When I submit the job using qsub, an error occurs: mpirun noticed that the job aborted, but has no info as to the process that caused that situation.

Here is the R script code:

install.packages("fume_1.0.tar.gz")
library(fume)
library(foreach)
library(doSNOW)
load("spei03_df.rdata",.GlobalEnv)

cl <- makeCluster(mpi.universe.size(), type='MPI' )
registerDoSNOW(cl)
MK_grid <- 
  foreach(i=1:6000, .packages="fume",.combine='rbind') %dopar% {
    abc <- mkTrend(as.matrix(spei03_data)[i,])
    data.frame(P_value=abc$`Corrected p.value`, Slope=abc$`Sen's Slope`*10,Zc=abc$Zc)
  }
    stopCluster(cl)
    save(MK_grid,file="MK_grid.rdata")
    mpi.exit()

The "fume" package is download from https://cran.r-project.org/src/contrib/Archive/fume/ .

Here is the pbs file:

#!/bin/bash
#PBS -l nodes=2:ppn=12
#PBS -l walltime=2:00:00 
module load application/R/3.3.1
cd $PBS_O_WORKDIR 

export OMP_NUM_THREADS=1
mpirun -np 1 -hostfile $PBS_NODEFILE R CMD BATCH Trend.R

Can anyone help? Thanks a lot.


回答1:


It's difficult to give advice on how to use a compute cluster that I've never used since each cluster is setup somewhat differently, but I can give you some general advice that may help.

Your job script looks reasonable to me. It's very similar to what I use on one of our Torque/Moab clusters. It's a good idea to verify that you're able to load all of the necessary R packages interactively because sometimes additional module files may need to be loaded. If you need to install packages yourself, make sure you install them in the standard "personal library" which is called something like "~/R/x86_64-pc-linux-gnu-library/3.3". That often avoids errors loading packages in the R script when executing in parallel.

I have more to say about your R script:

  • You need to load the Rmpi package in your R script using library(Rmpi). It isn't automatically loaded when loading doSNOW, so you will get an error when calling mpi.universe.size().

  • I don't recommend installing R packages in the R script itself. That will fail if install.script needs to prompt you for the CRAN repository, for example, since you can't execute interactive functions from an R script executed via mpirun.

  • I suggest starting mpi.universe.size() - 1 cluster workers when calling makeCluster. Since mpirun starts one worker, it may not be safe for makeCluster to spawn mpi.universe.size() additional workers since that would result in a total of mpi.universize.size() + 1 MPI processes. That works on some clusters, but it fails on at least one of our clusters.

  • While debugging, try using the makeCluster outfile='' option. Depending on your MPI installation, that may let you see error messages that would otherwise be hidden.



来源:https://stackoverflow.com/questions/40941241/mpi-cluster-based-parallel-calculation-in-r-on-westgrid-pbs-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!