Stratified Train/Test-split in scikit-learn

后端 未结 7 2194

I need to split my data into a training set (75%) and test set (25%). I currently do that with the code below:

X, Xt, userInfo, userInfo_train = sklearn.cros         


        
7条回答
  •  星月不相逢
    2020-11-27 03:19

    You can simply do it with train_test_split() method available in Scikit learn:

    from sklearn.model_selection import train_test_split 
    train, test = train_test_split(X, test_size=0.25, stratify=X['YOUR_COLUMN_LABEL']) 
    

    I have also prepared a short GitHub Gist which shows how stratify option works:

    https://gist.github.com/SHi-ON/63839f3a3647051a180cb03af0f7d0d9

提交回复
热议问题