问题
I am using this code to compare performance of a number of models:
from sklearn import model_selection
X = input data
Y = binary labels
models = []
models.append(('LR', LogisticRegression()))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
results = []
names = []
scoring = 'accuracy'
for name, model in models:
kfold = model_selection.KFold(n_splits=10, random_state=7)
cv_results = model_selection.cross_val_score(model, X, Y, cv=kfold,scoring=scoring)
results.append(cv_results)
names.append(name)
msg = "%s: %.2f (%.2f)" % (name, cv_results.mean(), cv_results.std())
print(msg)
I can use 'accuracy' and 'recall' as scoring and these will give accuracy and sensitivity. How can I create a scorer that gives me 'specificity'
Specificity= TN/(TN+FP)
where TN, and FP are true negative and false positive values in the confusion matrix
I have tried this
def tp(y_true, y_pred):
error= confusion_matrix(y_true, y_pred)[0,0]/(confusion_matrix(y_true,y_pred)[0,0] + confusion_matrix(y_true, y_pred)[0,1])
return error
my_scorer = make_scorer(tp, greater_is_better=True)
and then
cv_results = model_selection.cross_val_score(model, X,Y,cv=kfold,scoring=my_scorer)
but it will not work for n_split >=10 I get this error for calculation of my_scorer
IndexError: index 1 is out of bounds for axis 1 with size 1
回答1:
If you change the recall_score
parameters for a binary classifier to pos_label=0
you get specificity (default is sensitivity, pos_label=1
)
scoring = {
'accuracy': make_scorer(accuracy_score),
'sensitivity': make_scorer(recall_score),
'specificity': make_scorer(recall_score,pos_label=0)
}
hope this helps
回答2:
You cannot get specificity in scikit but what you can actually get is fpr
which is:
fpr = 1 - specificity
So for getting specificity, you just need to subtract fpr
from 1.
fpr can be calculated using roc_curve.
import numpy as np
from sklearn.metrics import roc_curve
y_true = np.array([1, 1, 2, 2])
y_pred = np.array([0.1, 0.4, 0.35, 0.8])
fpr, tpr, thresholds = roc_curve(y_true, y_pred)
print(fpr)
# array([ 0. , 0.5, 0.5, 1. ])
specificity = 1 - fpr
# array([ 1. , 0.5, 0.5, 0. ])
But for the above to work you need to calculate the y_pred by training the model.
If you want to use this inside the cross_val_score, you can make a custom scorer like this:
from sklearn.metrics import roc_curve
def specificity(y_true, y_pred):
fpr, tpr, thresholds = roc_curve(y_true, y_pred)
speci = 1 - fpr
return speci
from sklearn.metrics import make_scorer
scorer = make_scorer(specificity)
And then:
cv_results = model_selection.cross_val_score(model, X, Y, cv=kfold,scoring=scorer)
NOTE: The above codes will only give correct results for binary y
.
来源:https://stackoverflow.com/questions/47704133/how-to-define-specificity-as-a-callable-scorer-for-model-evaluation