Different results with randomForest() and caret's randomForest (method = “rf”)

前端 未结 1 1090
眼角桃花
眼角桃花 2020-12-14 23:41

I am new to caret, and I just want to ensure that I fully understand what it’s doing. Towards that end, I’ve been attempting to replicate the results I get from a randomFore

相关标签:
1条回答
  • 2020-12-15 00:08

    Using formula interface in train converts factors to dummy. To compare results from caret with randomForest you should use the non-formula interface.

    In your case, you should provide a seed inside trainControl to get the same result as in randomForest.

    Section training in caret webpage, there are some notes on reproducibility where it explains how to use seeds.

    library("randomForest")
    set.seed(1)
    rf.model <- randomForest(uptake ~ ., 
                             data = CO2,
                             ntree = 50,
                             nodesize = 5,
                             mtry = 2,
                             importance = TRUE, 
                             metric = "RMSE")
    
    library("caret")
    caret.oob.model <- train(CO2[, -5], CO2$uptake, 
                             method = "rf",
                             ntree = 50,
                             tuneGrid = data.frame(mtry = 2),
                             nodesize = 5,
                             importance = TRUE, 
                             metric = "RMSE",
                             trControl = trainControl(method = "oob", seed = 1),
                             allowParallel = FALSE)
    

    If you are doing resampling, you should provide seeds for each resampling iteration and an additional one for the final model. Examples in ?trainControl show how to create them.

    In the following example, the last seed is for the final model and I set it to 1.

    seeds <- as.vector(c(1:26), mode = "list")
    
    # For the final model
    seeds[[26]] <- 1
    
    caret.boot.model <- train(CO2[, -5], CO2$uptake, 
                              method = "rf",
                              ntree = 50,
                              tuneGrid = data.frame(mtry = 2),
                              nodesize = 5,
                              importance = TRUE, 
                              metric = "RMSE",
                              trControl = trainControl(method = "boot", seeds = seeds),
                              allowParallel = FALSE)
    

    Definig correctly the non-formula interface with caret and seed in trainControl you will get the same results in all three models:

    rf.model
    caret.oob.model$final
    caret.boot.model$final
    
    0 讨论(0)
提交回复
热议问题