问题
I was trying to run SVM model using 10-fold cross-validation with 3 repeats using the caret package in R. I want to get the prediction results of each fold using the best tuned hyperparameters. I am using the following code
# Load packages
library(mlbench)
library(caret)
# Load data
data(BostonHousing)
#Dividing the data into train and test set
set.seed(101)
sample <- createDataPartition(BostonHousing$medv, p=0.80, list = FALSE)
train <- BostonHousing[sample,]
test <- BostonHousing[-sample,]
control <- trainControl(method='repeatedcv', number=10, repeats=3, savePredictions=TRUE)
metric <- 'RMSE'
# Support Vector Machines (SVM)
set.seed(101)
fit.svm <- train(medv~., data=train, method='svmRadial', metric=metric,
preProc=c('center', 'scale'), trControl=control)
fit.svm$bestTune
fit.svm$pred
fit.svm$pred
giving me predictions using all combinations of the hyperparameters. But I want to have only the predictions using best-tuned hyperparameters for each 10-fold average of the repeats.
回答1:
One way to achieve your goal is to subset fit.svm$pred
using the hyper parameters in fit.svm$bestTune
, and then aggregate the desired measure by CV replicates. I will perform this using dplyr
:
library(tidyverse)
library(caret)
fit.svm$pred %>%
filter(sigma == fit.svm$bestTune$sigma & C == fit.svm$bestTune$C) %>% #subset
mutate(fold = gsub("\\..*", "", Resample), #extract fold info from resample info
rep = gsub(".*\\.(.*)", "\\1", Resample)) %>% #extract replicate info from resample info
group_by(rep) %>% #group by replicate
summarise(rmse = RMSE(pred, obs)) #aggregate the desired measure
output:
# A tibble: 3 x 2
rep rmse
<chr> <dbl>
1 Rep1 4.02
2 Rep2 3.96
3 Rep3 4.06
EDIT: if you dislike using regex, or just want to save a bit of typing you can use dplyr::separate
:
fit.svm$pred %>%
filter(sigma == fit.svm$bestTune$sigma & C == fit.svm$bestTune$C) %>%
separate(Resample, c("fold", "rep"), "\\.") %>%
group_by(rep) %>%
summarise(rmse = RMSE(obs, pred))
EDIT2: in response to comment. To write observed and predicted values to a csv. file:
fit.svm$pred %>%
filter(sigma == fit.svm$bestTune$sigma & C == fit.svm$bestTune$C) %>%
write.csv("predictions.csv")
来源:https://stackoverflow.com/questions/56950684/how-to-get-predictions-for-each-fold-in-10-fold-cross-validation-of-the-best-tun