How to compare ROC AUC scores of different binary classifiers and assess statistical significance in Python? (p-value, confidence interval)
问题 I would like to compare different binary classifiers in Python. For that, I want to calculate the ROC AUC scores, measure the 95% confidence interval (CI) , and p-value to access statistical significance. Below is a minimal example in scikit-learn which trains three different models on a binary classification dataset, plots the ROC curves and calculates the AUC scores. Here are my specific questions: How to calculate the 95% confidence interval (CI) of the ROC AUC scores on the test set? (e.g