I have a large data set and like to fit different logistic regression for each City, one of the column in my data. The following 70/30 split works without considering City g
Try createDataPartition from caret package. Its document states: By default, createDataPartition does a stratified random split of the data.
library(caret)
train.index <- createDataPartition(Data$Class, p = .7, list = FALSE)
train <- Data[ train.index,]
test <- Data[-train.index,]
it can also be used for stratified K-fold like:
ctrl <- trainControl(method = "repeatedcv",
repeats = 3,
...)
# when calling train, pass this train control
train(...,
trControl = ctrl,
...)
check out caret document for more details