Scikit-learn balanced subsampling

前端未结

关注

 13  1658

终归单人心 2020-12-02 10:34

I\'m trying to create N balanced random subsamples of my large unbalanced dataset. Is there a way to do this simply with scikit-learn / pandas or do I have to implement it m

13条回答

北荒 (楼主)

2020-12-02 11:04
A slight modification to the top answer by mikkom.

If you want to preserve ordering of the larger class data, ie. you don't want to shuffle.

Instead of
```
    if len(this_xs) > use_elems:
        np.random.shuffle(this_xs)
```
do this
```
        if len(this_xs) > use_elems:
            ratio = len(this_xs) / use_elems
            this_xs = this_xs[::ratio]
```
0 讨论(0)

查看其它13个回答
发布评论:

提交评论
- 加载中...