How to split/partition a dataset into training and test datasets for, e.g., cross validation?

前端未结

关注

 12  2069

醉话见心 2020-11-27 10:42

What is a good way to split a NumPy array randomly into training and testing/validation dataset? Something similar to the cvpartition or crossvalind

12条回答

广开言路 (楼主)

2020-11-27 11:25

I'm aware that my solution is not the best, but it comes in handy when you want to split data in a simplistic way, especially when teaching data science to newbies!

def simple_split(descriptors, targets):
    testX_indices = [i for i in range(descriptors.shape[0]) if i % 4 == 0]
    validX_indices = [i for i in range(descriptors.shape[0]) if i % 4 == 1]
    trainX_indices = [i for i in range(descriptors.shape[0]) if i % 4 >= 2]

    TrainX = descriptors[trainX_indices, :]
    ValidX = descriptors[validX_indices, :]
    TestX = descriptors[testX_indices, :]

    TrainY = targets[trainX_indices]
    ValidY = targets[validX_indices]
    TestY = targets[testX_indices]

    return TrainX, ValidX, TestX, TrainY, ValidY, TestY

According to this code, data will be split into three parts - 1/4 for the test part, another 1/4 for the validation part, and 2/4 for the training set.

0 讨论(0)

查看其它12个回答