scikit-learn random state in splitting dataset

后端 未结 9 1959
无人共我
无人共我 2020-12-05 09:12

Can anyone tell me why we set random state to zero in splitting train and test set.

X_train, X_test, y_train, y_test = \\
    train_test_split(X, y, test_size         


        
9条回答
  •  感动是毒
    2020-12-05 09:52

    when random_state set to an integer, train_test_split will return same results for each execution.

    when random_state set to an None, train_test_split will return different results for each execution.

    see below example:

    from sklearn.model_selection import train_test_split
    
    X_data = range(10)
    y_data = range(10)
    
    for i in range(5):
        X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size = 0.3,random_state = 0) # zero or any other integer
        print(y_test)
    
    print("*"*30)
    
    for i in range(5): 
        X_train, X_test, y_train, y_test = train_test_split(X_data, y_data, test_size = 0.3,random_state = None)
        print(y_test)
    

    Output:

    [2, 8, 4]

    [2, 8, 4]

    [2, 8, 4]

    [2, 8, 4]

    [2, 8, 4]


    [4, 7, 6]

    [4, 3, 7]

    [8, 1, 4]

    [9, 5, 8]

    [6, 4, 5]

提交回复
热议问题