What is a good way to split a NumPy array randomly into training and testing/validation dataset? Something similar to the cvpartition or crossvalind
I'm aware that my solution is not the best, but it comes in handy when you want to split data in a simplistic way, especially when teaching data science to newbies!
def simple_split(descriptors, targets):
testX_indices = [i for i in range(descriptors.shape[0]) if i % 4 == 0]
validX_indices = [i for i in range(descriptors.shape[0]) if i % 4 == 1]
trainX_indices = [i for i in range(descriptors.shape[0]) if i % 4 >= 2]
TrainX = descriptors[trainX_indices, :]
ValidX = descriptors[validX_indices, :]
TestX = descriptors[testX_indices, :]
TrainY = targets[trainX_indices]
ValidY = targets[validX_indices]
TestY = targets[testX_indices]
return TrainX, ValidX, TestX, TrainY, ValidY, TestY
According to this code, data will be split into three parts - 1/4 for the test part, another 1/4 for the validation part, and 2/4 for the training set.