Random Forest with classes that are very unbalanced

后端 未结 4 1782
攒了一身酷
攒了一身酷 2020-12-05 05:40

I am using random forests in a big data problem, which has a very unbalanced response class, so I read the documentation and I found the following parameters:



        
4条回答
  •  借酒劲吻你
    2020-12-05 06:05

    Sorry, I don't know how to post a comment on the earlier answer, so I'll create a separate answer.

    I suppose that the problem is caused by high imbalance of dataset (too few cases of one of the classes are present). For each tree in RF the algorithm creates bootstrap sample, which is a training set for this tree. And if you have too few examples of one of the classes in your dataset, then the bootstrap sampling will select examples of only one class (major class). And thus tree cannot be grown on only one class examples. It seems that there is a limit on 10 unsuccessful sampling attempts. So the proposition of DWin to reduce the degree of imbalance to lower values (1:100 or 1:10) is the most reasonable one.

提交回复
热议问题