Random Forest with classes that are very unbalanced

后端 未结 4 1780
攒了一身酷
攒了一身酷 2020-12-05 05:40

I am using random forests in a big data problem, which has a very unbalanced response class, so I read the documentation and I found the following parameters:



        
4条回答
  •  -上瘾入骨i
    2020-12-05 06:02

    You should try using sampling methods that reduce the degree of imbalance from 1:10,000 down to 1:100 or 1:10. You should also reduce the size of the trees that are generated. (At the moment these are recommendations that I am repeating only from memory, but I will see if I can track down more authority than my spongy cortex.)

    One way of reducing the size of trees is to set the "nodesize" larger. With that degree of imbalance you might need to have the node size really large, say 5-10,000. Here's a thread in rhelp: https://stat.ethz.ch/pipermail/r-help/2011-September/289288.html

    In the current state of the question you have sampsize=c(250000,2000), whereas I would have thought that something like sampsize=c(8000,2000), was more in line with my suggestions. I think you are creating samples where you do not have any of the group that was sampled with only 2000.

提交回复
热议问题