I have a large dataset and want to split it into training(50%) and testing set(50%).
Say I have 100 examples stored the input file, each line contains one example.
sklearn.cross_validation
is deprecated since version 0.18, instead you should use sklearn.model_selection
as show below
from sklearn.model_selection import train_test_split
import numpy
with open("datafile.txt", "rb") as f:
data = f.read().split('\n')
data = numpy.array(data) #convert array to numpy type array
x_train ,x_test = train_test_split(data,test_size=0.5) #test_size=0.5(whole_data)