I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's train function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to:
- Register the number of cores (for example, with the 
doMCpackage and theregisterDoMCfunction), setnthread=1via caret's train function so it passes that parameter to xgboost, setallowParallel=TRUEintrainControl, and letcarethandle the parallelization for the cross-validation; or - Disable caret parallelization (
allowParallel=FALSEand no parallel back-end registration) and setnthreadto the number of physical cores, so the parallelization is contained exclusively within xgboost. 
Or is there no "better" way to perform the parallelization?
Edit: I ran the code suggested by @topepo, with tuneLength = 10 and search="random", and specifying nthread=1 on the last line (otherwise I understand that xgboost will use multithreading). There are the results I got:
xgb_par[3] elapsed   283.691  just_seq[3] elapsed  276.704  mc_par[3] elapsed  89.074  just_seq[3]/mc_par[3] elapsed  3.106451  just_seq[3]/xgb_par[3] elapsed  0.9753711  xgb_par[3]/mc_par[3] elapsed  3.184891  At the end, it turned out that both for my data and for this test case, letting caret handle the parallelization was a better choice in terms of runtime.