Caret returns different predictions with caret train object than it does with the extracted final model

落花浮王杯 提交于 2021-02-19 05:25:47

问题


I prefer to use caret when fitting models because of its relative speed and preprocessing capabilities. However, I'm slightly confused on how it makes predictions. When comparing predictions made directly from the train object and predictions made from the extracted final model, I'm seeing very different numbers. The predictions from the train object appear to be more accurate.

library(caret)
library(ranger)

x1 <- rnorm(100)
x2 <- rbeta(100, 1, 1)

y <- 2*x1 + x2 + 5*x1*x2

data <- data.frame(x1, x2, y)
fitRanger <- train(y ~ x1 + x2, data = data,
                   method = 'ranger', 
                   tuneLength = 1,
                   preProcess = c('knnImpute', 'center', 'scale'))

predict.data <- data.frame(x1 = rnorm(10), x2 = rbeta(10, 1, 1))
prediction1 <- predict(fitRanger, newdata = predict.data)
prediction2 <- predict(fitRanger$finalModel, data = predict.data)$prediction

results <- data.frame(prediction1, prediction2)
results

I'm positive it has something to do with how I preprocess the data in the train object, but even when I preprocess the test data and use the Ranger model to make predictions, the values are different

predict.data.processed <- predict.data %>% 
                             preProcess(method = c('knnImpute', 
                                                   'center', 
                                                   'scale')) %>% .$data

results3 <- predict(fitRanger$finalModel, data = predict.data.processed)$prediction

results <- cbind(results, results3)
results

I want to extract the predictions from each individual tree in the ranger model, which I can't do in caret. Any thoughts?


回答1:


In order to get the same predictions from the final model as with caret train you should pre-process the data in the same way. Using your example with set.seed(1):

caret predict:

prediction1 <- predict(fitRanger,
                       newdata = predict.data)

ranger predict on the final model. caret pre process was used on predict.data

prediction2 <- predict(fitRanger$finalModel,
                       data = predict(fitRanger$preProcess,
                                      predict.data))$prediction

all.equal(prediction1,
          prediction2)
#output
TRUE


来源:https://stackoverflow.com/questions/50397115/caret-returns-different-predictions-with-caret-train-object-than-it-does-with-th

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!