What does the parameter 'classwt' in RandomForest function in RandomForest package in R stand for?

岁酱吖の 提交于 2019-12-18 11:45:34

问题


The help page for randomforest::randomforest() says:

"classwt - Priors of the classes. Need not add up to one. Ignored for regression."

Could setting the classwt parameter help when you have heavy unbalanced data, ie. priors of classes differs strongly ?

How should I set classwt when training a model on a dataset with 3 classes with a vector of priors equal to (p1,p2,p3), and in test set priors are (q1,q2,q3)?


回答1:


could setting classwt parameter help when you have heavy unbalanced data - priors of classes differs strongly?

Yes, setting values of classwt could be useful for unbalanced datasets. And I agree with joran, that these values are trasformed in probabilities for sampling training data (according Breiman's arguments in his original article).

How set classwt when in training dataset with 3 classes you have vector of priors equal to (p1,p2,p3), and in test set priors are (q1,q2,q3)?

For training you can simply specify

rf <- randomForest(x=x, y=y, classwt=c(p1,p2,p3))

For test set no priors can be used: 1) there is no such option in predict method of randomForest package; 2) weights have only sense for training of the model and not for prediction.



来源:https://stackoverflow.com/questions/10112678/what-does-the-parameter-classwt-in-randomforest-function-in-randomforest-packa

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!