Using the predict_proba() function of RandomForestClassifier in the safe and right way

前端未结

关注

 2  1362

猫巷女王i 2020-12-09 09:07

I\'m using Scikit-learn to apply machine learning algorithm on my data sets. Sometimes I need to have the probabilities of labels/classes instead of the labels/classes thems

2条回答

南笙 (楼主)

2020-12-09 09:38

A RandomForestClassifier is a collection of DecisionTreeClassifier's. No matter how big your training set, a decision tree simply returns: a decision. One class has probability 1, the other classes have probability 0.

The RandomForest simply votes among the results. predict_proba() returns the number of votes for each class (each tree in the forest makes its own decision and chooses exactly one class), divided by the number of trees in the forest. Hence, your precision is exactly 1/n_estimators. Want more "precision"? Add more estimators. If you want to see variation at the 5th digit, you will need 10**5 = 100,000 estimators, which is excessive. You normally don't want more than 100 estimators, and often not that many.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...