How to interpret almost perfect accuracy and AUC-ROC but zero f1-score, precision and recall

后端未结

关注

 1  1962

长情又很酷 2020-12-12 15:12

I am training ML logistic classifier to classify two classes using python scikit-learn. They are in an extremely imbalanced data (about 14300:1). I\'m getting almost 100% ac

1条回答

失恋的感觉 (楼主)

2020-12-12 15:44
One must understand crucial difference between AUC ROC and "point-wise" metrics like accuracy/precision etc. ROC is a function of a threshold. Given a model (classifier) that outputs the probability of belonging to each class, we predict the class that has the highest probability (support). However, sometimes we can get better scores by changing this rule and requiring one support to be 2 times bigger than the other to actually classify as a given class. This is often true for imbalanced datasets. This way you are actually modifying the learned prior of classes to better fit your data. ROC looks at "what would happen if I change this threshold to all possible values" and then AUC ROC computes the integral of such a curve.

Consequently:
- high AUC ROC vs low f1 or other "point" metric, means that your classifier currently does a bad job, however you can find the threshold for which its score is actually pretty decent
- low AUC ROC and low f1 or other "point" metric, means that your classifier currently does a bad job, and even fitting a threshold will not change it
- high AUC ROC and high f1 or other "point" metric, means that your classifier currently does a decent job, and for many other values of threshold it would do the same
- low AUC ROC vs high f1 or other "point" metric, means that your classifier currently does a decent job, however for many other values of threshold - it is pretty bad
0 讨论(0)
发布评论:

提交评论
- 加载中...