How can I analyze a confusion matrix?

前端 未结 3 552
北恋
北恋 2021-01-19 05:04

When I print out scikit-learn\'s confusion matrix, I receive a very huge matrix. I want to analyze what are the true positives, true negatives etc. How can I do so? This is

3条回答
  •  情书的邮戳
    2021-01-19 05:55

    Approach 1: Binary Classification

    from sklearn.metrics import confusion_matrix as cm
    import pandas as pd
    
    y_test = [1, 0, 0]
    y_pred = [1, 0, 0]
    confusion_matrix=cm(y_test, y_pred)
    
    list1 = ["Actual 0", "Actual 1"]
    list2 = ["Predicted 0", "Predicted 1"]
    pd.DataFrame(confusion_matrix, list1,list2)
    

    Approach 2: Multiclass Classification

    While sklearn.metrics.confusion_matrix provides a numeric matrix, you can generate a 'report' using the following:

    import pandas as pd
    y_true = pd.Series([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2])
    y_pred = pd.Series([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2])
    
    pd.crosstab(y_true, y_pred, rownames=['True'], colnames=['Predicted'], margins=True)
    

    which results in:

    Predicted  0  1  2  All
    True                   
    0          3  0  0    3
    1          0  1  2    3
    2          2  1  3    6
    All        5  2  5   12
    

    This allows us to see that:

    1. The diagonal elements show the number of correct classifications for each class: 3, 1 and 3 for the classes 0, 1 and 2.
    2. The off-diagonal elements provides the misclassifications: for example, 2 of the class 2 were misclassified as 0, none of the class 0 were misclassified as 2, etc.
    3. The total number of classifications for each class in both y_true and y_pred, from the "All" subtotals

    This method also works for text labels, and for a large number of samples in the dataset can be extended to provide percentage reports.

提交回复
热议问题