发表新帖

发表新帖

Randomly split data by criterion into training and testing data set using R

前端未结

关注

 2  1628

灰色年华 2020-12-20 02:45

Gidday,

I\'m looking for a way to randomly split a data frame (e.g. 90/10 split) for testing and training of a model keeping a certain grouping criteria.

Im

2条回答

长情又很酷 (楼主)

2020-12-20 03:26
```
comps <- levels(df$companycode)

trn <- sample(comps, length(comps)*0.9)

df.trn <- subset(df, companycode %in% trn)
df.tst <- subset(df, !(companycode %in% trn))
```
This splits your data so that 90% of companies are in the training set and the rest in the test set.

This does not guarantee that 90% of your rows will be training and 10% test. The rigorous way to achieve this is left as an exercise for the reader. The non-rigorous way would be to repeat the sampling until you get proportions that are roughly correct.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题