Parallel version of transform (or mutate) in R?

≯℡__Kan透↙ 提交于 2021-01-29 05:21:11

问题


I have a slow function that I want to apply to each row in a data.frame. The computation is embarrassingly parallel.

I have 4 cores, but R's built in functions only uses one.

All I want to do is a parallel equivalent to:

data$c = slow.foo(data$a, data$b)

I can't find clear instructions on which library to use (overwhelmed by choice) and how to use it. Any help would be greatly appreciated.


回答1:


The parallel package is included with base R. Here's a quick example using parApply from that package:

library(parallel)

# Some dummy data
d <- data.frame(x1=runif(1000), x2=runif(1000))

# Create a cluster with 1 fewer cores than are available. Adjust as necessary
cl <- makeCluster(detectCores() - 1)

# Just like regular apply, but rows get sent to the various processes
out <- parApply(cl, d, 1, function(x) x[1] - x[2])

stopCluster(cl)

# Same as x1 - x2?
identical(out, d$x1 - d$x2)

# [1] TRUE

You also have, e.g., parSapply and parLapply at your disposal.

Of course, for the example I've given, the vectorised operation d$x1 - d$x2 is much faster. Think about whether your processes can be vectorised rather than performed row by row.



来源:https://stackoverflow.com/questions/24347675/parallel-version-of-transform-or-mutate-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!