Random Forest with classes that are very unbalanced

后端未结

关注

 4  1782

攒了一身酷 2020-12-05 05:40

I am using random forests in a big data problem, which has a very unbalanced response class, so I read the documentation and I found the following parameters:

4条回答

借酒劲吻你 (楼主)

2020-12-05 06:05

Sorry, I don't know how to post a comment on the earlier answer, so I'll create a separate answer.

I suppose that the problem is caused by high imbalance of dataset (too few cases of one of the classes are present). For each tree in RF the algorithm creates bootstrap sample, which is a training set for this tree. And if you have too few examples of one of the classes in your dataset, then the bootstrap sampling will select examples of only one class (major class). And thus tree cannot be grown on only one class examples. It seems that there is a limit on 10 unsuccessful sampling attempts. So the proposition of DWin to reduce the degree of imbalance to lower values (1:100 or 1:10) is the most reasonable one.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...