How does the class_weight parameter in scikit-learn work?

后端 未结 2 1805
[愿得一人]
[愿得一人] 2020-11-29 15:23

I am having a lot of trouble understanding how the class_weight parameter in scikit-learn\'s Logistic Regression operates.

The Situation

2条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2020-11-29 15:41

    First off, it might not be good to just go by recall alone. You can simply achieve a recall of 100% by classifying everything as the positive class. I usually suggest using AUC for selecting parameters, and then finding a threshold for the operating point (say a given precision level) that you are interested in.

    For how class_weight works: It penalizes mistakes in samples of class[i] with class_weight[i] instead of 1. So higher class-weight means you want to put more emphasis on a class. From what you say it seems class 0 is 19 times more frequent than class 1. So you should increase the class_weight of class 1 relative to class 0, say {0:.1, 1:.9}. If the class_weight doesn't sum to 1, it will basically change the regularization parameter.

    For how class_weight="auto" works, you can have a look at this discussion. In the dev version you can use class_weight="balanced", which is easier to understand: it basically means replicating the smaller class until you have as many samples as in the larger one, but in an implicit way.

提交回复
热议问题