executing glmnet in parallel in R

后端 未结 2 1831
春和景丽
春和景丽 2020-12-28 09:06

My training dataset has about 200,000 records and I have 500 features. (These are sales data from a retail org). Most of the features are 0/1 and is stored as a sparse matri

2条回答
  •  悲&欢浪女
    2020-12-28 09:56

    Stumbled upon this old thread and thought it would be useful to mention that with the future framework, it is possible to do nested and parallel foreach() calls. For instance, assume you have three local machines (which SSH access) and you want to run four cores on each, then you can use:

    library("doFuture")
    registerDoFuture()
    plan(list(
      tweak(cluster, workers = c("machine1", "machine2", "machine3")),
      tweak(multiprocess, workers = 4L)
    ))
    
    
    model_fit <- foreach(ii = seq_len(ncol(target))) %dopar% {
      cv.glmnet(x, target[,ii], family = "binomial", alpha = 0,
                type.measure = "auc", grouped = FALSE, standardize = FALSE,
                parallel = TRUE)
    }
    str(model_fit)
    

    The "outer" foreach-loop will iterate over the targets such that each iteration is processed by a separate machine. Each iteration will in turn process cv.glmnet() using four workers on whatever machine it ends up on.

    (Of course, if you only got access to a single machine, then it makes little sense to do nested parallel processing. I such cases, you can use:

    plan(list(
      sequential,
      tweak(multiprocess, workers = 4L)
    ))
    

    to parallelize the cv.glmnet() call, or alternatively,

    plan(list(
      tweak(multiprocess, workers = 4L),
      sequential
    ))
    

    , or equivalently just plan(multiprocess, workers = 4L), to parallelize over targets.

提交回复
热议问题