How to split/partition a dataset into training and test datasets for, e.g., cross validation?

前端未结

关注

 12  2008

醉话见心 2020-11-27 10:42

What is a good way to split a NumPy array randomly into training and testing/validation dataset? Something similar to the cvpartition or crossvalind

12条回答

北海茫月 (楼主)

2020-11-27 11:25
Likely you will not only need to split into train and test, but also cross validation to make sure your model generalizes. Here I am assuming 70% training data, 20% validation and 10% holdout/test data.

Check out the np.split:

If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in

ary[:2] ary[2:3] ary[3:]
```
t, v, h = np.split(df.sample(frac=1, random_state=1), [int(0.7*len(df)), int(0.9*len(df))]) 
```
0 讨论(0)

查看其它12个回答
发布评论:

提交评论
- 加载中...