Is there an efficient way to parallelize mapply?

后端未结

关注

 3  1441

被撕碎了的回忆 2021-01-05 12:22

I have many rows and on every row I compute the uniroot of a non-linear function. I have a quad-core Ubuntu machine which hasn\'t stopped running my code for two days now. N

3条回答

清歌不尽 (楼主)

2021-01-05 12:28
I'd use the parallel package that's built into R 2.14 and work with matrices. You could then simply use mclapply like this:
```
dfm <- as.matrix(df)
result <- mclapply(seq_len(nrow(dfm)),
          function(x) do.call(get_uniroot,as.list(dfm[x,])),
          mc.cores=4L
          )
unlist(result)
```
This is basically doing the same mapply does, but in a parallel way.

But...

Mind you that parallelization always counts for some overhead as well. As I explained in the question you link to, going parallel only pays off if your inner function calculates significantly longer than the overhead involved. In your case, your uniroot function works pretty fast. You might then consider to cut your data frame in bigger chunks, and combine both mapply and mclapply. A possible way to do this is:
```
ncores <- 4
id <- floor(
        quantile(0:nrow(df),
                 1-(0:ncores)/ncores
        )
      )
idm <- embed(id,2)

mapply_uniroot <- function(id){
  tmp <- df[(id[1]+1):id[2],]
  mapply(get_uniroot, tmp$P, tmp$B0, tmp$CF1, tmp$CF2, tmp$CF3)
}
result <-mclapply(nrow(idm):1,
                  function(x) mapply_uniroot(idm[x,]),
                  mc.cores=ncores)
final <- unlist(result)
```
This might need some tweaking, but it essentially breaks your df in exactly as many bits as there are cores, and run the mapply on every core. To show this works :
```
> x1 <- mapply(get_uniroot, df$P, df$B0, df$CF1, df$CF2, df$CF3)
> all.equal(final,x1)
[1] TRUE
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...