executing glmnet in parallel in R

岁酱吖の 提交于 2019-11-30 03:51:11

In order to execute "cv.glmnet" in parallel, you have to specify the parallel=TRUE option, and register a foreach parallel backend. This allows you to choose the parallel backend that works best for your computing environment.

Here's the documentation for the "parallel" argument from the cv.glmnet man page:

parallel: If 'TRUE', use parallel 'foreach' to fit each fold. Must register parallel before hand, such as 'doMC' or others. See the example below.

Here's an example using the doParallel package which works on Windows, Mac OS X, and Linux:

library(doParallel)
registerDoParallel(4)
m <- cv.glmnet(x, target[,1], family="binomial", alpha=0, type.measure="auc",
               grouped=FALSE, standardize=FALSE, parallel=TRUE)

This call to cv.glmnet will execute in parallel using four workers. On Linux and Mac OS X, it will execute the tasks using "mclapply", while on Windows it will use "clusterApplyLB".

Nested parallelism gets tricky, and may not help a lot with only 4 workers. I would try using a normal for loop around cv.glmnet (as in your second example) with a parallel backend registered and see what the performance is before adding another level of parallelism.

Also note that the assignment to "model" in your first example isn't going to work when you register a parallel backend. When running in parallel, side-effects generally get thrown away, as with most parallel programming packages.

Stumbled upon this old thread and thought it would be useful to mention that with the future framework, it is possible to do nested and parallel foreach() calls. For instance, assume you have three local machines (which SSH access) and you want to run four cores on each, then you can use:

library("doFuture")
registerDoFuture()
plan(list(
  tweak(cluster, workers = c("machine1", "machine2", "machine3")),
  tweak(multiprocess, workers = 4L)
))


model_fit <- foreach(ii = seq_len(ncol(target))) %dopar% {
  cv.glmnet(x, target[,ii], family = "binomial", alpha = 0,
            type.measure = "auc", grouped = FALSE, standardize = FALSE,
            parallel = TRUE)
}
str(model_fit)

The "outer" foreach-loop will iterate over the targets such that each iteration is processed by a separate machine. Each iteration will in turn process cv.glmnet() using four workers on whatever machine it ends up on.

(Of course, if you only got access to a single machine, then it makes little sense to do nested parallel processing. I such cases, you can use:

plan(list(
  sequential,
  tweak(multiprocess, workers = 4L)
))

to parallelize the cv.glmnet() call, or alternatively,

plan(list(
  tweak(multiprocess, workers = 4L),
  sequential
))

, or equivalently just plan(multiprocess, workers = 4L), to parallelize over targets.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!