问题:

I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's train function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to:

Register the number of cores (for example, with the doMC package and the registerDoMC function), set nthread=1 via caret's train function so it passes that parameter to xgboost, set allowParallel=TRUE in trainControl, and let caret handle the parallelization for the cross-validation; or
Disable caret parallelization (allowParallel=FALSE and no parallel back-end registration) and set nthread to the number of physical cores, so the parallelization is contained exclusively within xgboost.

Or is there no "better" way to perform the parallelization?

Edit: I ran the code suggested by @topepo, with tuneLength = 10 and search="random", and specifying nthread=1 on the last line (otherwise I understand that xgboost will use multithreading). There are the results I got:

xgb_par[3] elapsed   283.691  just_seq[3] elapsed  276.704  mc_par[3] elapsed  89.074  just_seq[3]/mc_par[3] elapsed  3.106451  just_seq[3]/xgb_par[3] elapsed  0.9753711  xgb_par[3]/mc_par[3] elapsed  3.184891

At the end, it turned out that both for my data and for this test case, letting caret handle the parallelization was a better choice in terms of runtime.

回答1:

It is not simple to project what the best strategy would be. My (biased) thought is that you should parallelize the process that takes the longest. Here, that would be the resampling loop since an open thread/worker would invoke the model many times. The opposite approach of parallelizing the model fit will start and stop workers repeatedly and theoretically slows things down. Your mileage may vary.

I don't have OpenMP installed but there is code below to test (if you could report your results, that would be helpful).

library(caret) library(plyr) library(xgboost) library(doMC)  foo <- function(...) {   set.seed(2)   mod <- train(Class ~ ., data = dat,                 method = "xgbTree", tuneLength = 50,                ..., trControl = trainControl(search = "random"))   invisible(mod) }  set.seed(1) dat <- twoClassSim(1000)  just_seq <- system.time(foo())   ## I don't have OpenMP installed xgb_par <- system.time(foo(nthread = 5))  registerDoMC(cores=5) mc_par <- system.time(foo())

My results (without OpenMP)

> just_seq[3] elapsed  326.422  > xgb_par[3] elapsed  319.862  > mc_par[3] elapsed  102.329  >  > ## Speedups > xgb_par[3]/mc_par[3] elapsed  3.12582  > just_seq[3]/mc_par[3]  elapsed  3.189927  > just_seq[3]/xgb_par[3]  elapsed  1.020509

转载请标明出处:Parallel processing with xgboost and caret

文章来源: Parallel processing with xgboost and caret

标签

processing