Scikit-learn: How to obtain True Positive, True Negative, False Positive and False Negative

送分小仙女□ 提交于 2019-11-28 03:06:29

If you have two lists that have the predicted and actual values; as it appears you do, you can pass them to a function that will calculate TP, FP, TN, FN with something like this:

def perf_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i]==y_hat[i]==1:
           TP += 1
        if y_hat[i]==1 and y_actual[i]!=y_hat[i]:
           FP += 1
        if y_actual[i]==y_hat[i]==0:
           TN += 1
        if y_hat[i]==0 and y_actual[i]!=y_hat[i]:
           FN += 1

return(TP, FP, TN, FN)

From here I think you will be able to calculate rates of interest to you, and other performance measure like specificity and sensitivity.

For the multi-class case, everything you need can be found from the confusion matrix. For example, if your confusion matrix looks like this:

Then what you're looking for, per class, can be found like this:

Using pandas/numpy, you can do this for all classes at once like so:

FP = confusion_matrix.sum(axis=0) - np.diag(confusion_matrix)  
FN = confusion_matrix.sum(axis=1) - np.diag(confusion_matrix)
TP = np.diag(confusion_matrix)
TN = confusion_matrix.values.sum() - (FP + FN + TP)

# Sensitivity, hit rate, recall, or true positive rate
TPR = TP/(TP+FN)
# Specificity or true negative rate
TNR = TN/(TN+FP) 
# Precision or positive predictive value
PPV = TP/(TP+FP)
# Negative predictive value
NPV = TN/(TN+FN)
# Fall out or false positive rate
FPR = FP/(FP+TN)
# False negative rate
FNR = FN/(TP+FN)
# False discovery rate
FDR = FP/(TP+FP)

# Overall accuracy
ACC = (TP+TN)/(TP+FP+FN+TN)

According to scikit-learn documentation,

http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

By definition a confusion matrix C is such that C[i, j] is equal to the number of observations known to be in group i but predicted to be in group j.

Thus in binary classification, the count of true negatives is C[0,0], false negatives is C[1,0], true positives is C[1,1] and false positives is C[0,1].

CM = confusion_matrix(y_true, y_pred)

TN = CM[0][0]
FN = CM[1][0]
TP = CM[1][1]
FP = CM[0][1]

You can obtain all of the parameters from the confusion matrix. The structure of the confusion matrix(which is 2X2 matrix) is as follows

TP|FP
FN|TN

So

TP = cm[0][0]
FP = cm[0][1]
FN = cm[1][0]
TN = cm[1][1]

More details at https://en.wikipedia.org/wiki/Confusion_matrix

The one liner to get true postives etc. out of the confusion matrix is to ravel it:

from sklearn.metrics import confusion_matrix

y_true = [1, 1, 0, 0]
y_pred = [1, 0, 1, 0]   

tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
print(tn, fp, fn, tp)  # 1 1 1 1

In the scikit-learn 'metrics' library there is a confusion_matrix method which gives you the desired output.

You can use any classifier that you want. Here I used the KNeighbors as example.

from sklearn import metrics, neighbors

clf = neighbors.KNeighborsClassifier()

X_test = ...
y_test = ...

expected = y_test
predicted = clf.predict(X_test)

conf_matrix = metrics.confusion_matrix(expected, predicted)

>>> print conf_matrix
>>>  [[1403   87]
     [  56 3159]]

The docs: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix

you can try sklearn.metrics.classification_report as below:

import sklearn
y_true = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]
y_pred = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

print sklearn.metrics.classification_report(y_true, y_pred)

output:

         precision    recall  f1-score   support

      0       0.80      0.57      0.67         7
      1       0.50      0.75      0.60         4

      avg / total       0.69      0.64      0.64        11

I wrote a version that works using only numpy. I hope it helps you.

import numpy as np

def perf_metrics_2X2(yobs, yhat):
    """
    Returns the specificity, sensitivity, positive predictive value, and 
    negative predictive value 
    of a 2X2 table.

    where:
    0 = negative case
    1 = positive case

    Parameters
    ----------
    yobs :  array of positive and negative ``observed`` cases
    yhat : array of positive and negative ``predicted`` cases

    Returns
    -------
    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    Author: Julio Cardenas-Rodriguez
    """
    TP = np.sum(  yobs[yobs==1] == yhat[yobs==1] )
    TN = np.sum(  yobs[yobs==0] == yhat[yobs==0] )
    FP = np.sum(  yobs[yobs==1] == yhat[yobs==0] )
    FN = np.sum(  yobs[yobs==0] == yhat[yobs==1] )

    sensitivity  = TP / (TP+FN)
    specificity  = TN / (TN+FP)
    pos_pred_val = TP/ (TP+FP)
    neg_pred_val = TN/ (TN+FN)

    return sensitivity, specificity, pos_pred_val, neg_pred_val

if you have more than one classes in your classifier, you might want to use pandas-ml at that part. Confusion Matrix of pandas-ml give more detailed information. check that

I think both of the answers are not fully correct. For example, suppose that we have the following arrays;
y_actual = [1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0]

y_predic = [1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0]

If we compute the FP, FN, TP and TN values manually, they should be as follows:

FP: 3 FN: 1 TP: 3 TN: 4

However, if we use the first answer, results are given as follows:

FP: 1 FN: 3 TP: 3 TN: 4

They are not correct, because in the first answer, False Positive should be where actual is 0, but the predicted is 1, not the opposite. It is also same for False Negative.

And, if we use the second answer, the results are computed as follows:

FP: 3 FN: 1 TP: 4 TN: 3

True Positive and True Negative numbers are not correct, they should be opposite.

Am I correct with my computations? Please let me know if I am missing something.

I have tried some of the answers and found them not working.

This works for me:

from sklearn.metrics import classification_report

print(classification_report(y_test, predicted)) 

Here's a fix to invoketheshell's buggy code (which currently appears as the accepted answer):

def performance_measure(y_actual, y_hat):
    TP = 0
    FP = 0
    TN = 0
    FN = 0

    for i in range(len(y_hat)): 
        if y_actual[i] == y_hat[i]==1:
            TP += 1
        if y_hat[i] == 1 and y_actual[i] == 0:
            FP += 1
        if y_hat[i] == y_actual[i] == 0:
            TN +=1
        if y_hat[i] == 0 and y_actual[i] == 1:
            FN +=1

    return(TP, FP, TN, FN)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!