I have trouble understanding the difference (if there is one) between roc_auc_score()
and auc()
in scikit-learn.
Im tying to predict a bina
When you use the y_pred (class labels), you already decided on the threshold. When you use y_prob (positive class probability) you are open to the threshold, and the ROC Curve should help you decide the threshold.
For the first case you are using the probabilities:
y_probs = clf.predict_proba(xtest)[:,1]
fp_rate, tp_rate, thresholds = roc_curve(y_true, y_probs)
auc(fp_rate, tp_rate)
When you do that, you're considering the AUC 'before' taking a decision on the threshold you'll be using.
In the second case, you are using the prediction (not the probabilities), in that case, use 'predict' instead of 'predict_proba' for both and you should get the same result.
y_pred = clf.predict(xtest)
fp_rate, tp_rate, thresholds = roc_curve(y_true, y_pred)
print auc(fp_rate, tp_rate)
# 0.857142857143
print roc_auc_score(y, y_pred)
# 0.857142857143