Gidday,
I\'m looking for a way to randomly split a data frame (e.g. 90/10 split) for testing and training of a model keeping a certain grouping criteria.
Im
comps <- levels(df$companycode)
trn <- sample(comps, length(comps)*0.9)
df.trn <- subset(df, companycode %in% trn)
df.tst <- subset(df, !(companycode %in% trn))
This splits your data so that 90% of companies are in the training set and the rest in the test set.
This does not guarantee that 90% of your rows will be training and 10% test. The rigorous way to achieve this is left as an exercise for the reader. The non-rigorous way would be to repeat the sampling until you get proportions that are roughly correct.