Difference of prediction results in random forest model

前端 未结 1 1792
猫巷女王i
猫巷女王i 2021-01-07 11:18

I have built an Random Forest model and I got two different prediction results when I wrote two different lines of code in order to generate the prediction. I wonder which o

相关标签:
1条回答
  • 2021-01-07 11:45

    The difference is in the two calls to predict:

    predict(model)
    

    and

    predict(model, newdata=dat)
    

    The first option gets the out-of-bag predictions on your training data from the random forest. This is generally what you want, when comparing predicted values to actuals.

    The second treats your training data as if it was a new dataset, and runs the observations down each tree. This will result in an artificially close correlation between the predictions and the actuals, since the RF algorithm generally doesn't prune the individual trees, relying instead on the ensemble of trees to control overfitting. So don't do this if you want to get predictions on the training data.

    0 讨论(0)
提交回复
热议问题