问题
I have a slow function that I want to apply to each row in a data.frame. The computation is embarrassingly parallel.
I have 4 cores, but R's built in functions only uses one.
All I want to do is a parallel equivalent to:
data$c = slow.foo(data$a, data$b)
I can't find clear instructions on which library to use (overwhelmed by choice) and how to use it. Any help would be greatly appreciated.
回答1:
The parallel
package is included with base R. Here's a quick example using parApply
from that package:
library(parallel)
# Some dummy data
d <- data.frame(x1=runif(1000), x2=runif(1000))
# Create a cluster with 1 fewer cores than are available. Adjust as necessary
cl <- makeCluster(detectCores() - 1)
# Just like regular apply, but rows get sent to the various processes
out <- parApply(cl, d, 1, function(x) x[1] - x[2])
stopCluster(cl)
# Same as x1 - x2?
identical(out, d$x1 - d$x2)
# [1] TRUE
You also have, e.g., parSapply
and parLapply
at your disposal.
Of course, for the example I've given, the vectorised operation d$x1 - d$x2
is much faster. Think about whether your processes can be vectorised rather than performed row by row.
来源:https://stackoverflow.com/questions/24347675/parallel-version-of-transform-or-mutate-in-r