I have built an Random Forest model and I got two different prediction results when I wrote two different lines of code in order to generate the prediction. I wonder which o
The difference is in the two calls to predict:
predict(model)
and
predict(model, newdata=dat)
The first option gets the out-of-bag predictions on your training data from the random forest. This is generally what you want, when comparing predicted values to actuals.
The second treats your training data as if it was a new dataset, and runs the observations down each tree. This will result in an artificially close correlation between the predictions and the actuals, since the RF algorithm generally doesn't prune the individual trees, relying instead on the ensemble of trees to control overfitting. So don't do this if you want to get predictions on the training data.