scikit-learn - ROC curve with confidence intervals

前端未结

关注

 2  1595

旧时难觅i 2021-01-30 01:40

I am able to get a ROC curve using scikit-learn with fpr, tpr, thresholds = metrics.roc_curve(y_true,y_pred, pos_label=1), wh

2条回答

清歌不尽 (楼主)

2021-01-30 02:34
You can bootstrap the roc computations (sample with replacement new versions of y_true / y_pred out of the original y_true / y_pred and recompute a new value for roc_curve each time) and the estimate a confidence interval this way.

To take the variability induced by the train test split into account, you can also use the ShuffleSplit CV iterator many times, fit a model on the train split, generate y_pred for each model and thus gather an empirical distribution of roc_curves as well and finally compute confidence intervals for those.

Edit: boostrapping in python

Here is an example for bootstrapping the ROC AUC score out of the predictions of a single model. I chose to bootstap the ROC AUC to make it easier to follow as a Stack Overflow answer, but it can be adapted to bootstrap the whole curve instead:
```
import numpy as np
from scipy.stats import sem
from sklearn.metrics import roc_auc_score

y_pred = np.array([0.21, 0.32, 0.63, 0.35, 0.92, 0.79, 0.82, 0.99, 0.04])
y_true = np.array([0,    1,    0,    0,    1,    1,    0,    1,    0   ])

print("Original ROC area: {:0.3f}".format(roc_auc_score(y_true, y_pred)))

n_bootstraps = 1000
rng_seed = 42  # control reproducibility
bootstrapped_scores = []

rng = np.random.RandomState(rng_seed)
for i in range(n_bootstraps):
    # bootstrap by sampling with replacement on the prediction indices
    indices = rng.randint(0, len(y_pred), len(y_pred))
    if len(np.unique(y_true[indices])) < 2:
        # We need at least one positive and one negative sample for ROC AUC
        # to be defined: reject the sample
        continue

    score = roc_auc_score(y_true[indices], y_pred[indices])
    bootstrapped_scores.append(score)
    print("Bootstrap #{} ROC area: {:0.3f}".format(i + 1, score))
```
You can see that we need to reject some invalid resamples. However on real data with many predictions this is a very rare event and should not impact the confidence interval significantly (you can try to vary the rng_seed to check).

Here is the histogram:
```
import matplotlib.pyplot as plt
plt.hist(bootstrapped_scores, bins=50)
plt.title('Histogram of the bootstrapped ROC AUC scores')
plt.show()
```
Note that the resampled scores are censored in the [0 - 1] range causing a high number of scores in the last bin.

To get a confidence interval one can sort the samples:
```
sorted_scores = np.array(bootstrapped_scores)
sorted_scores.sort()

# Computing the lower and upper bound of the 90% confidence interval
# You can change the bounds percentiles to 0.025 and 0.975 to get
# a 95% confidence interval instead.
confidence_lower = sorted_scores[int(0.05 * len(sorted_scores))]
confidence_upper = sorted_scores[int(0.95 * len(sorted_scores))]
print("Confidence interval for the score: [{:0.3f} - {:0.3}]".format(
    confidence_lower, confidence_upper))
```
which gives:
```
Confidence interval for the score: [0.444 - 1.0]
```
The confidence interval is very wide but this is probably a consequence of my choice of predictions (3 mistakes out of 9 predictions) and the total number of predictions quite small.

Another remark on the plot: the scores are quantized (many empty histogram bins). This is a consequence of the small number of predictions. One could introduce a bit of Gaussian noise on the scores (or the y_pred values) to smooth the distribution and make the histogram look better. But then the choice of the smoothing bandwidth is tricky.

Finally as stated earlier this confidence interval is specific to you training set. To get a better estimate of the variability of the ROC of induced by your model class and parameters, you should do iterated cross-validation instead. However this is often much more costly as you need to train a new model for each random train / test split.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...