问题
In scikit's (0.18.1) documentation I find what follows a bit confusing. Seems that writing your own scoring function is doable in multiple ways. But what's the difference?
GridSearchCV takes a scoring
argument as a:
scorer callable object / function with signature
scorer(estimator, X, y)
This option is supported also in model evaluation docs.
Conversely, make_scorer wants a score_func
as a:
score function (or loss function) with signature
score_func(y, y_pred, **kwargs)
Example
Both GridSearchCV(scoring=dummy_scorer)
and GridSearchCV(scoring=make_scorer(dummy_scorer2))
print what expected.
def dummy_scorer(estimator, X, y):
print X
print y
return 1
def dummy_scorer2(y1, y2):
print y1
print y2
return 1
回答1:
You see, scikit-learn has different utility functions (precision_score, recall_score, accuracy_score
etc) which can be used to directly specify the actual and predicted values and calculate the result. Even writing the custom scorer must use the actual and predicted values in most cases.
So the signature has to be (y, y_pred, ...)
.
Now, in techniques like GridSearch or RandomizedSearch, the score on the cross-validated data has to be automatically. As the estimator and X keeps changing (X changes due to cross-validation), so do the predicted values and corresponding actual values.
So scorer(estimator, X, y)
makes sense. Take the estimator and X
, call estimator.predict(X)
to get the predicted output, compare it with actual (y
) and calculate result.
What make_scorer()
does is just return a pointer to the actual function which does all that what I described above.
From the source-code in scikit-learn, we can validate the above things:
Line347 : cls = _PredictScorer
return cls(score_func, sign, kwargs)
Here cls
is a pointer to the function at this line:
Line100 : y_pred = estimator.predict(X)
if sample_weight is not None:
return self._sign * self._score_func(y_true, y_pred,
sample_weight=sample_weight,
**self._kwargs)
else:
return self._sign * self._score_func(y_true, y_pred, **self._kwargs)
Also, when you actually use the string values like "accuracy", "precision" etc in scoring
parameter in GridSearchCV, then also, it is first converted into scorer(estimator, X, y, ...)
by using make_scorer, which can be verified in this line at the same source file
Hope it makes some sense. Please feel free to ask if any doubt or question about it.
来源:https://stackoverflow.com/questions/43523210/scorer-function-difference-between-make-scorer-score-func-and