Getting a low ROC AUC score but a high accuracy

后端 未结 2 814
挽巷
挽巷 2020-11-27 19:16

Using a LogisticRegression class in scikit-learn on a version of the flight delay dataset.

I use pandas to select some columns

2条回答
  •  甜味超标
    2020-11-27 19:46

    I don't know what exactly AIR_DEL15 is, which you use as your label (it is not in the original data). My guess is that it is an imbalanced feature, i.e there are much more 0's than 1's; in such a case, accuracy as a metric is not meaningful, and you should use precision, recall, and the confusion matrix instead - see also this thread).

    Just as an extreme example, if 87% of your labels are 0's, you can have a 87% accuracy "classifier" simply (and naively) by classifying all samples as 0; in such a case, you would also have a low AUC (fairly close to 0.5, as in your case).

    For a more general (and much needed, in my opinion) discussion of what exactly AUC is, see my other answer.

提交回复
热议问题