Sklearn : How to balance classification using DecisionTreeClassifier?

问题

I have a data set where the classes are unbalanced. The classes are either 0, 1 or 2.

How can I calculate the prediction error for each class and then re-balance weights accordingly in Sklearn.

回答1:

If you want to fully balance (treat each class as equally important) you can simply pass class_weight='balanced', as it is stated in the docs:

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y))

回答2:

If the frequency of class A is 10% and the frequency of class B is 90%, then the class B will become the dominant class and your decision tree will become biased toward the classes that are dominant

In this case, you can pass a dic {A:9,B:1} to the model to specify the weight of each class, like

clf = tree.DecisionTreeClassifier(class_weight={A:9,B:1})

The class_weight='balanced' will also work, It just automatically adjusts weights according to the proportion of each class frequencies

After I use class_weight='balanced', the record number of each class has become the same (around 88923)

来源：https://stackoverflow.com/questions/37522191/sklearn-how-to-balance-classification-using-decisiontreeclassifier

标签

python

machine-learning

scikit-learn

decision-tree

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!