Creating scorer for Brier Score Loss in scikit-learn

问题

I'm trying to make use of GridSearchCV and RandomizedSearchCV in scikit-learn (0.16.1) for logistic regression and random forest classifiers (and possibly others down the road) for binary class problems. I managed to get GridSearchCV to work with the standard LogisticRegression classifier, but I cannot get LogisticRegressionCV to work (or RandomizedGridCV for the RandomForestClassifier) with a customized scoring function, specifically brier_score_loss. I have tried this code:

lrcv = LogisticRegressionCV(scoring = make_scorer(brier_score_loss, greater_is_better=False, needs_proba=True, needs_threshold=False, pos_label=1))
lrcv_clf = lrcv.fit(X=X_train,y=y_train)

But I keep getting errors that are essentially saying the brier_score_loss function is receiving input (y_prob) with 2 columns, causing an error (bad input shape). Is there a way to specify to use only the second column of y_prob (lrcv.predict_proba) so that the Brier score can be calculated in this way? I thought pos_label might help but apparently not. Do I need to avoid make_scorer and just create my own scoring function?

Thanks for any suggestions!

回答1:

predict_proba returns two probabilities for every predicted y value, the first is about 0 and the second is about 1. You should choose which one you need and pass it further to the scoring function.
I'm doing this with the simple proxy function:

def ProbaScoreProxy(y_true, y_probs, class_idx, proxied_func, **kwargs):
    return proxied_func(y_true, y_probs[:, class_idx], **kwargs)

That can be used like this:

scorer = metrics.make_scorer(ProbaScoreProxy, greater_is_better=False, needs_proba=True, class_idx=1, proxied_func=metrics.brier_score_loss)

For the binary classification the class_idx can be 0 or 1.

来源：https://stackoverflow.com/questions/29664657/creating-scorer-for-brier-score-loss-in-scikit-learn

标签

python

scikit-learn