How to split/partition a dataset into training and test datasets for, e.g., cross validation?

前端 未结 12 2008
醉话见心
醉话见心 2020-11-27 10:42

What is a good way to split a NumPy array randomly into training and testing/validation dataset? Something similar to the cvpartition or crossvalind

12条回答
  •  北海茫月
    2020-11-27 11:25

    Likely you will not only need to split into train and test, but also cross validation to make sure your model generalizes. Here I am assuming 70% training data, 20% validation and 10% holdout/test data.

    Check out the np.split:

    If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in

    ary[:2] ary[2:3] ary[3:]

    t, v, h = np.split(df.sample(frac=1, random_state=1), [int(0.7*len(df)), int(0.9*len(df))]) 
    

提交回复
热议问题