Scikit-learn balanced subsampling

前端 未结 13 1658
终归单人心
终归单人心 2020-12-02 10:34

I\'m trying to create N balanced random subsamples of my large unbalanced dataset. Is there a way to do this simply with scikit-learn / pandas or do I have to implement it m

13条回答
  •  北荒
    北荒 (楼主)
    2020-12-02 11:04

    A slight modification to the top answer by mikkom.

    If you want to preserve ordering of the larger class data, ie. you don't want to shuffle.

    Instead of

        if len(this_xs) > use_elems:
            np.random.shuffle(this_xs)
    

    do this

            if len(this_xs) > use_elems:
                ratio = len(this_xs) / use_elems
                this_xs = this_xs[::ratio]
    

提交回复
热议问题