stratified splitting the data

前端 未结 4 1112
礼貌的吻别
礼貌的吻别 2020-12-15 08:12

I have a large data set and like to fit different logistic regression for each City, one of the column in my data. The following 70/30 split works without considering City g

4条回答
  •  半阙折子戏
    2020-12-15 08:34

    Try createDataPartition from caret package. Its document states: By default, createDataPartition does a stratified random split of the data.

    library(caret)
    train.index <- createDataPartition(Data$Class, p = .7, list = FALSE)
    train <- Data[ train.index,]
    test  <- Data[-train.index,]
    

    it can also be used for stratified K-fold like:

    ctrl <- trainControl(method = "repeatedcv",
                         repeats = 3,
                         ...)
    # when calling train, pass this train control
    train(...,
          trControl = ctrl,
          ...)
    

    check out caret document for more details

提交回复
热议问题