How to write a confusion matrix in Python?

前端 未结 14 1930
太阳男子
太阳男子 2020-12-04 06:48

I wrote a confusion matrix calculation code in Python:

def conf_mat(prob_arr, input_arr):
        # confusion matrix
        conf_arr = [[0, 0], [0, 0]]

            


        
相关标签:
14条回答
  • 2020-12-04 07:43

    Here is a simple implementation that handles an unequal number of classes in the predicted and actual labels (see examples 3 and 4). I hope this helps!

    For folks just learning this, here's a quick review. The labels for the columns indicate the predicted class, and the labels for the rows indicate the correct class. In example 1, we have [3 1] on the top row. Again, rows indicate truth, so this means that the correct label is "0" and there are 4 examples with ground truth label of "0". Columns indicate predictions, so we have 3/4 of the samples correctly labeled as "0", but 1/4 was incorrectly labeled as a "1".

    def confusion_matrix(actual, predicted):
        classes       = np.unique(np.concatenate((actual,predicted)))
        confusion_mtx = np.empty((len(classes),len(classes)),dtype=np.int)
        for i,a in enumerate(classes):
            for j,p in enumerate(classes):
                confusion_mtx[i,j] = np.where((actual==a)*(predicted==p))[0].shape[0]
        return confusion_mtx
    

    Example 1:

    actual    = np.array([1,1,1,1,0,0,0,0])
    predicted = np.array([1,1,1,1,0,0,0,1])
    confusion_matrix(actual,predicted)
    
       0  1
    0  3  1
    1  0  4
    

    Example 2:

    actual    = np.array(["a","a","a","a","b","b","b","b"])
    predicted = np.array(["a","a","a","a","b","b","b","a"])
    confusion_matrix(actual,predicted)
    
       0  1
    0  4  0
    1  1  3
    

    Example 3:

    actual    = np.array(["a","a","a","a","b","b","b","b"])
    predicted = np.array(["a","a","a","a","b","b","b","z"]) # <-- notice the 3rd class, "z"
    confusion_matrix(actual,predicted)
    
       0  1  2
    0  4  0  0
    1  0  3  1
    2  0  0  0
    

    Example 4:

    actual    = np.array(["a","a","a","x","x","b","b","b"]) # <-- notice the 4th class, "x"
    predicted = np.array(["a","a","a","a","b","b","b","z"])
    confusion_matrix(actual,predicted)
    
       0  1  2  3
    0  3  0  0  0
    1  0  2  0  1
    2  1  1  0  0
    3  0  0  0  0
    
    0 讨论(0)
  • 2020-12-04 07:44

    Scikit-Learn provides a confusion_matrix function

    from sklearn.metrics import confusion_matrix
    y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
    y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]
    confusion_matrix(y_actu, y_pred)
    

    which output a Numpy array

    array([[3, 0, 0],
           [0, 1, 2],
           [2, 1, 3]])
    

    But you can also create a confusion matrix using Pandas:

    import pandas as pd
    y_actu = pd.Series([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2], name='Actual')
    y_pred = pd.Series([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2], name='Predicted')
    df_confusion = pd.crosstab(y_actu, y_pred)
    

    You will get a (nicely labeled) Pandas DataFrame:

    Predicted  0  1  2
    Actual
    0          3  0  0
    1          0  1  2
    2          2  1  3
    

    If you add margins=True like

    df_confusion = pd.crosstab(y_actu, y_pred, rownames=['Actual'], colnames=['Predicted'], margins=True)
    

    you will get also sum for each row and column:

    Predicted  0  1  2  All
    Actual
    0          3  0  0    3
    1          0  1  2    3
    2          2  1  3    6
    All        5  2  5   12
    

    You can also get a normalized confusion matrix using:

    df_conf_norm = df_confusion / df_confusion.sum(axis=1)
    
    Predicted         0         1         2
    Actual
    0          1.000000  0.000000  0.000000
    1          0.000000  0.333333  0.333333
    2          0.666667  0.333333  0.500000
    

    You can plot this confusion_matrix using

    import matplotlib.pyplot as plt
    def plot_confusion_matrix(df_confusion, title='Confusion matrix', cmap=plt.cm.gray_r):
        plt.matshow(df_confusion, cmap=cmap) # imshow
        #plt.title(title)
        plt.colorbar()
        tick_marks = np.arange(len(df_confusion.columns))
        plt.xticks(tick_marks, df_confusion.columns, rotation=45)
        plt.yticks(tick_marks, df_confusion.index)
        #plt.tight_layout()
        plt.ylabel(df_confusion.index.name)
        plt.xlabel(df_confusion.columns.name)
    
    plot_confusion_matrix(df_confusion)
    

    plot confusion matrix

    Or plot normalized confusion matrix using:

    plot_confusion_matrix(df_conf_norm)  
    

    plot confusion matrix normalized

    You might also be interested by this project https://github.com/pandas-ml/pandas-ml and its Pip package https://pypi.python.org/pypi/pandas_ml

    With this package confusion matrix can be pretty-printed, plot. You can binarize a confusion matrix, get class statistics such as TP, TN, FP, FN, ACC, TPR, FPR, FNR, TNR (SPC), LR+, LR-, DOR, PPV, FDR, FOR, NPV and some overall statistics

    In [1]: from pandas_ml import ConfusionMatrix
    In [2]: y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
    In [3]: y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]
    In [4]: cm = ConfusionMatrix(y_actu, y_pred)
    In [5]: cm.print_stats()
    Confusion Matrix:
    
    Predicted  0  1  2  __all__
    Actual
    0          3  0  0        3
    1          0  1  2        3
    2          2  1  3        6
    __all__    5  2  5       12
    
    
    Overall Statistics:
    
    Accuracy: 0.583333333333
    95% CI: (0.27666968568210581, 0.84834777019156982)
    No Information Rate: ToDo
    P-Value [Acc > NIR]: 0.189264302376
    Kappa: 0.354838709677
    Mcnemar's Test P-Value: ToDo
    
    
    Class Statistics:
    
    Classes                                        0          1          2
    Population                                    12         12         12
    P: Condition positive                          3          3          6
    N: Condition negative                          9          9          6
    Test outcome positive                          5          2          5
    Test outcome negative                          7         10          7
    TP: True Positive                              3          1          3
    TN: True Negative                              7          8          4
    FP: False Positive                             2          1          2
    FN: False Negative                             0          2          3
    TPR: (Sensitivity, hit rate, recall)           1  0.3333333        0.5
    TNR=SPC: (Specificity)                 0.7777778  0.8888889  0.6666667
    PPV: Pos Pred Value (Precision)              0.6        0.5        0.6
    NPV: Neg Pred Value                            1        0.8  0.5714286
    FPR: False-out                         0.2222222  0.1111111  0.3333333
    FDR: False Discovery Rate                    0.4        0.5        0.4
    FNR: Miss Rate                                 0  0.6666667        0.5
    ACC: Accuracy                          0.8333333       0.75  0.5833333
    F1 score                                    0.75        0.4  0.5454545
    MCC: Matthews correlation coefficient  0.6831301  0.2581989  0.1690309
    Informedness                           0.7777778  0.2222222  0.1666667
    Markedness                                   0.6        0.3  0.1714286
    Prevalence                                  0.25       0.25        0.5
    LR+: Positive likelihood ratio               4.5          3        1.5
    LR-: Negative likelihood ratio                 0       0.75       0.75
    DOR: Diagnostic odds ratio                   inf          4          2
    FOR: False omission rate                       0        0.2  0.4285714
    

    I noticed that a new Python library about Confusion Matrix named PyCM is out: maybe you can have a look.

    0 讨论(0)
提交回复
热议问题