I am fond of the parallel package in R and how easy and intuitive it is to do parallel versions of apply, sapply, etc.
Is the
You can just use the parallel versions of lapply or sapply, instead of saying to replicate this expression n times you do the apply on 1:n and instead of giving an expression, you wrap that expression in a function that ignores the argument sent to it.
possibly something like:
#create cluster
library(parallel)
cl <- makeCluster(detectCores()-1)
# get library support needed to run the code
clusterEvalQ(cl,library(MASS))
# put objects in place that might be needed for the code
myData <- data.frame(x=1:10, y=rnorm(10))
clusterExport(cl,c("myData"))
# Set a different seed on each member of the cluster (just in case)
clusterSetRNGStream(cl)
#... then parallel replicate...
parSapply(cl, 1:10000, function(i,...) { x <- rnorm(10); mean(x)/sd(x) } )
#stop the cluster
stopCluster(cl)
as the parallel equivalent of:
replicate(10000, {x <- rnorm(10); mean(x)/sd(x) } )
Using clusterEvalQ as a model, I think I would implement a parallel replicate as:
parReplicate <- function(cl, n, expr, simplify=TRUE, USE.NAMES=TRUE)
parSapply(cl, integer(n), function(i, ex) eval(ex, envir=.GlobalEnv),
substitute(expr), simplify=simplify, USE.NAMES=USE.NAMES)
The arguments simplify and USE.NAMES are compatible with sapply rather than replicate, but they make it a better wrapper around parSapply in my opinion.
Here's an example derived from the replicate man page:
library(parallel)
cl <- makePSOCKcluster(3)
hist(parReplicate(cl, 100, mean(rexp(10))))
This is the best I could come up with:
cl <- makeCluster(getOption("cl.cores", 4))
clusterCall(cl, replicate(50, simulate_fxns() ))
stopCluster(cl)