r caret estimate parameters on a subset fit to full data

后端 未结 1 433
离开以前
离开以前 2021-01-28 15:40

I have a dataset of 550k items that I split 500k for training and 50k for testing. During the training stage it is necessary to establish the \'best\' combination of each algori

1条回答
  •  無奈伤痛
    2021-01-28 16:15

    This is possible by specifying the index, indexOut and indexFinal arguments to trainControl.

    Here is an example using the Sonar data set from mlbench library:

    library(caret)
    library(mlbench)
    data(Sonar)
    

    Lets say we want to draw half of the Sonar data set each time for training, and repeat that 10 times:

    train_inds <- replicate(10, sample(1:nrow(Sonar), size = nrow(Sonar)/2), simplify = FALSE)
    

    If you are interested in a different sampling approach please post the details. This is for illustration only.

    For testing we will use random 10 rows not in the train_inds:

    test_inds <- lapply(train_inds, function(x){
      inds <- setdiff(1:nrow(Sonar), x)
      return(sample(inds, size = 10))
    }
    )
    

    now just specify the test_inds and train_inds in trainControl:

    ctrl <-  trainControl(
        method = "boot",
        number = 10,
        classProbs = T,
        savePredictions = "final",
        index = train_inds,
        indexOut = test_inds,
        indexFinal = 1:nrow(Sonar),
        summaryFunction = twoClassSummary
      )
    

    you can also specify indexFinal if you do not wish to fit the final model on all rows.

    and fit:

    model <- train(
        Class ~ .,
        data = Sonar,
        method = "rf",
        trControl = ctrl,
        metric = "ROC"
      )
    model
    #output
    Random Forest 
    
    208 samples, 208 used for final model
     60 predictor
      2 classes: 'M', 'R' 
    
    No pre-processing
    Resampling: Bootstrapped (10 reps) 
    Summary of sample sizes: 104, 104, 104, 104, 104, 104, ... 
    Resampling results across tuning parameters:
    
      mtry  ROC        Sens    Spec     
       2    0.9104167  0.7750  0.8250000
      31    0.9125000  0.7875  0.7916667
      60    0.9083333  0.7875  0.8166667
    

    0 讨论(0)
提交回复
热议问题