How does Caret generate an OLS model with K-fold cross validation?

本秂侑毒 提交于 2019-12-14 03:59:05

问题


Let's say I have some generic dataset for which an OLS regression is the best choice. So, I generate a model with some first-order terms and decide to use Caret in R for my regression coefficient estimates and error estimates.

In caret, this ends up being:

k10_cv = trainControl(method="cv", number=10)
ols_model = train(Y ~ X1 + X2 + X3, data = my_data, trControl = k10_cv, method = "lm")

From there, I can pull out regression information using summary(ols_model) and can also pull some more information by just calling ols_model.

When I just look at ols_model, is the RMSE/R-square/MAE being calculated via the typical k-fold CV approach? Also, when the model I see in summary(ols_model) is generated, is this model trained on the entire dataset or is it an average of models generated across each of the folds?

If not, in the interest of trading variance for bias, is there a way to acquire an OLS model within Caret that is trained on one fold at a time?


回答1:


Here's reproducible data for your example.

library("caret")
my_data <- iris

k10_cv <- trainControl(method="cv", number=10)

set.seed(100)
ols_model <- train(Sepal.Length ~  Sepal.Width + Petal.Length + Petal.Width,
                  data = my_data, trControl = k10_cv, method = "lm")


> ols_model$results
  intercept      RMSE  Rsquared       MAE     RMSESD RsquaredSD      MAESD
1      TRUE 0.3173942 0.8610242 0.2582343 0.03881222 0.04784331 0.02960042

1)The ols_model$results above is based on the mean of each of the different resampling below:

> (ols_model$resample)
        RMSE  Rsquared       MAE Resample
1  0.3386472 0.8954600 0.2503482   Fold01
2  0.3154519 0.8831588 0.2815940   Fold02
3  0.3167943 0.8904550 0.2441537   Fold03
4  0.2644717 0.9085548 0.2145686   Fold04
5  0.3769947 0.8269794 0.3070733   Fold05
6  0.3720051 0.7792611 0.2746565   Fold06
7  0.3258501 0.8095141 0.2647466   Fold07
8  0.2962375 0.8530810 0.2731445   Fold08
9  0.3059100 0.8351535 0.2611982   Fold09
10 0.2615792 0.9286246 0.2108592   Fold10

I.e.

> mean(ols_model$resample$RMSE)==ols_model$results$RMSE
[1] TRUE

2)The model is trained on the whole training set. You can check this with either using lm or specify method = "none" for the trainControl.

 coef(lm(Sepal.Length ~  Sepal.Width + Petal.Length + Petal.Width, data = my_data))
 (Intercept)  Sepal.Width Petal.Length  Petal.Width 
   1.8559975    0.6508372    0.7091320   -0.5564827 

Which is identical with ols_model$finalModel.



来源:https://stackoverflow.com/questions/52460078/how-does-caret-generate-an-ols-model-with-k-fold-cross-validation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!