How to split/partition a dataset into training and test datasets for, e.g., cross validation?

前端 未结 12 2069
醉话见心
醉话见心 2020-11-27 10:42

What is a good way to split a NumPy array randomly into training and testing/validation dataset? Something similar to the cvpartition or crossvalind

12条回答
  •  广开言路
    2020-11-27 11:25

    I'm aware that my solution is not the best, but it comes in handy when you want to split data in a simplistic way, especially when teaching data science to newbies!

    def simple_split(descriptors, targets):
        testX_indices = [i for i in range(descriptors.shape[0]) if i % 4 == 0]
        validX_indices = [i for i in range(descriptors.shape[0]) if i % 4 == 1]
        trainX_indices = [i for i in range(descriptors.shape[0]) if i % 4 >= 2]
    
        TrainX = descriptors[trainX_indices, :]
        ValidX = descriptors[validX_indices, :]
        TestX = descriptors[testX_indices, :]
    
        TrainY = targets[trainX_indices]
        ValidY = targets[validX_indices]
        TestY = targets[testX_indices]
    
        return TrainX, ValidX, TestX, TrainY, ValidY, TestY
    

    According to this code, data will be split into three parts - 1/4 for the test part, another 1/4 for the validation part, and 2/4 for the training set.

提交回复
热议问题