How to split data into trainset and testset randomly?

后端 未结 9 1311
花落未央
花落未央 2020-12-07 16:27

I have a large dataset and want to split it into training(50%) and testing set(50%).

Say I have 100 examples stored the input file, each line contains one example.

9条回答
  •  心在旅途
    2020-12-07 17:25

    To answer @desmond.carros question, I modified the best answer as follows,

     import random
     file=open("datafile.txt","r")
     data=list()
     for line in file:
        data.append(line.split(#your preferred delimiter))
     file.close()
     random.shuffle(data)
     train_data = data[:int((len(data)+1)*.80)] #Remaining 80% to training set
     test_data = data[int((len(data)+1)*.80):] #Splits 20% data to test set
    

    The code splits the entire dataset to 80% train and 20% test data

提交回复
热议问题