How to split data into trainset and testset randomly?

后端 未结 9 1300
花落未央
花落未央 2020-12-07 16:27

I have a large dataset and want to split it into training(50%) and testing set(50%).

Say I have 100 examples stored the input file, each line contains one example.

9条回答
  •  隐瞒了意图╮
    2020-12-07 17:10

    You can try this approach

    import pandas
    import sklearn
    csv = pandas.read_csv('data.csv')
    train, test = sklearn.cross_validation.train_test_split(csv, train_size = 0.5)
    

    UPDATE: train_test_split was moved to model_selection so the current way (scikit-learn 0.22.2) to do it is this:

    import pandas
    import sklearn
    csv = pandas.read_csv('data.csv')
    train, test = sklearn.model_selection.train_test_split(csv, train_size = 0.5)
    

提交回复
热议问题