Ensemble different datasets in R

女生的网名这么多〃 提交于 2020-01-06 04:45:10

问题


I am trying to combine signals from different models using the example described here . I have different datasets which predicts the same output. However, when I combine the model output in caretList, and ensemble the signals, it gives an error

Error in check_bestpreds_resamples(modelLibrary) : 
  Component models do not have the same re-sampling strategies

Here is the reproducible example

library(caret)
library(caretEnsemble)
df1 <-
  data.frame(x1 = rnorm(200),
             x2 = rnorm(200),
             y = as.factor(sample(c("Jack", "Jill"), 200, replace = T)))

df2 <-
  data.frame(z1 = rnorm(400),
             z2 = rnorm(400),
             y = as.factor(sample(c("Jack", "Jill"), 400, replace = T)))

library(caret)
check_1 <- train( x = df1[,1:2],y = df1[,3],
                 method = "nnet",
                 tuneLength = 10,
                 trControl = trainControl(method = "cv",
                                          classProbs = TRUE,
                                          savePredictions = T))

check_2 <- train( x = df2[,1:2],y = df2[,3] ,
                 method = "nnet",
                 preProcess = c("center", "scale"),
                 tuneLength = 10,
                 trControl = trainControl(method = "cv",
                                          classProbs = TRUE,
                                          savePredictions = T))


combine <- c(check_1, check_2)
ens <- caretEnsemble(combine)

回答1:


First of all, you are trying to combine 2 models trained on different training data sets. That is not going to work. All ensemble models will need to be based on the same training set. You will have different sets of resamples in each trained model. Hence your current error.

Also building your models without using caretList is dangerous because you will have a big change of getting different resample strategies. You can control that a bit better by using the index in trainControl (see vignette).

If you use 1 dataset you can use the following code:

ctrl <- trainControl(method = "cv",
                     number = 5,
                     classProbs = TRUE,
                     savePredictions = "final")

set.seed(1324)
# will generate the following warning:
# indexes not defined in trControl.  Attempting to set them ourselves, so 
# each model in the ensemble will have the same resampling indexes.
models <- caretList(x = df1[,1:2],
                    y = df1[,3] ,
                    trControl = ctrl,
                    tuneList = list(
                      check_1 = caretModelSpec(method = "nnet", tuneLength = 10),
                      check_2 = caretModelSpec(method = "nnet", tuneLength = 10, preProcess = c("center", "scale"))
                    )) 


ens <- caretEnsemble(models)


A glm ensemble of 2 base models: nnet, nnet

Ensemble results:
Generalized Linear Model 

200 samples
  2 predictor
  2 classes: 'Jack', 'Jill' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results:

  Accuracy   Kappa     
  0.5249231  0.04164767

Also read this guide on different ensemble strategies.



来源:https://stackoverflow.com/questions/49725934/ensemble-different-datasets-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!