Is there an efficient way to parallelize mapply?

后端 未结 3 1441
被撕碎了的回忆
被撕碎了的回忆 2021-01-05 12:22

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn\'t stopped running my code for two days now. N

3条回答
  •  清歌不尽
    2021-01-05 12:28

    I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:

    dfm <- as.matrix(df)
    result <- mclapply(seq_len(nrow(dfm)),
              function(x) do.call(get_uniroot,as.list(dfm[x,])),
              mc.cores=4L
              )
    unlist(result)
    

    This is basically doing the same mapply does, but in a parallel way.

    But...

    Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:

    ncores <- 4
    id <- floor(
            quantile(0:nrow(df),
                     1-(0:ncores)/ncores
            )
          )
    idm <- embed(id,2)
    
    mapply_uniroot <- function(id){
      tmp <- df[(id[1]+1):id[2],]
      mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
    }
    result <-mclapply(nrow(idm):1,
                      function(x) mapply_uniroot(idm[x,]),
                      mc.cores=ncores)
    final <- unlist(result)
    

    This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :

    > x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
    > all.equal(final,x1)
    [1] TRUE
    

提交回复
热议问题