Randomly split data by criterion into training and testing data set using R

前端 未结 2 1628
灰色年华
灰色年华 2020-12-20 02:45

Gidday,

I\'m looking for a way to randomly split a data frame (e.g. 90/10 split) for testing and training of a model keeping a certain grouping criteria.

Im

2条回答
  •  长情又很酷
    2020-12-20 03:26

    comps <- levels(df$companycode)
    
    trn <- sample(comps, length(comps)*0.9)
    
    df.trn <- subset(df, companycode %in% trn)
    df.tst <- subset(df, !(companycode %in% trn))
    

    This splits your data so that 90% of companies are in the training set and the rest in the test set.

    This does not guarantee that 90% of your rows will be training and 10% test. The rigorous way to achieve this is left as an exercise for the reader. The non-rigorous way would be to repeat the sampling until you get proportions that are roughly correct.

提交回复
热议问题