Scikit-learn balanced subsampling

前端 未结 13 1614
终归单人心
终归单人心 2020-12-02 10:34

I\'m trying to create N balanced random subsamples of my large unbalanced dataset. Is there a way to do this simply with scikit-learn / pandas or do I have to implement it m

13条回答
  •  再見小時候
    2020-12-02 11:08

    I found the best solutions here

    And this is the one I think it's the simplest.

    dataset = pd.read_csv("data.csv")
    X = dataset.iloc[:, 1:12].values
    y = dataset.iloc[:, 12].values
    
    from imblearn.under_sampling import RandomUnderSampler
    rus = RandomUnderSampler(return_indices=True)
    X_rus, y_rus, id_rus = rus.fit_sample(X, y)
    

    then you can use X_rus, y_rus data

    For versions 0.4<=:

    from imblearn.under_sampling import RandomUnderSampler
    rus = RandomUnderSampler()
    X_rus, y_rus= rus.fit_sample(X, y)
    

    Then, indices of the samples randomly selected can be reached by sample_indices_ attribute.

提交回复
热议问题