Set seed parallel random forest in caret for reproducible result

本小妞迷上赌 提交于 2020-01-13 19:57:05

问题


I wish to run random forest in parallel using caret package, and I wish to set the seeds for reproducible result as in Fully reproducible parallel models using caret. However, I don't understand line 9 in the following code taken from caret help: why do we sample 22 (plus the last model in line 12, 23) integer numbers (12 values for parameter k are evaluated)? For information, I wish to run 5-fold CV to evaluate 584 values for RF parameter 'mtry'. Any help is much appreciated. Thank you.

## Not run:

## Do 5 repeats of 10-Fold CV for the iris data. We will fit
## a KNN model that evaluates 12 values of k and set the seed
## at each iteration.

set.seed(123)
seeds <- vector(mode = "list", length = 51)
for(i in 1:50) seeds[[i]] <- sample.int(1000, 22) # Why 22?

## For the last model:
seeds[[51]] <- sample.int(1000, 1)

ctrl <- trainControl(method = "repeatedcv", 
                 repeats = 5,
                 seeds = seeds)

回答1:


I'd say it is a mistake, and should be 12 instead of 22.

From what I understand, you will be running the model 10*5 = 50 times, for each value of k. Hence, for each i in 1:50, you'll need 12 seeds (one for every k). After obtaining the best k, you will run the final model. This time, you only need one seed (no more repeated resampling).



来源:https://stackoverflow.com/questions/27944558/set-seed-parallel-random-forest-in-caret-for-reproducible-result

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!