How to split data into trainset and testset randomly?

后端 未结 9 1315
花落未央
花落未央 2020-12-07 16:27

I have a large dataset and want to split it into training(50%) and testing set(50%).

Say I have 100 examples stored the input file, each line contains one example.

9条回答
  •  猫巷女王i
    2020-12-07 17:22

    A quick note for the answer from @subin sahayam

     import random
     file=open("datafile.txt","r")
     data=list()
     for line in file:
        data.append(line.split(#your preferred delimiter))
     file.close()
     random.shuffle(data)
     train_data = data[:int((len(data)+1)*.80)] #Remaining 80% to training set
     test_data = data[int(len(data)*.80+1):] #Splits 20% data to test set
    

    If your list size is a even number, you should not add the 1 in the code below. Instead, you need to check the size of the list first and then determine if you need to add the 1.

    test_data = data[int(len(data)*.80+1):]

提交回复
热议问题