When I print out scikit-learn\'s confusion matrix, I receive a very huge matrix. I want to analyze what are the true positives, true negatives etc. How can I do so? This is
Approach 1: Binary Classification
from sklearn.metrics import confusion_matrix as cm
import pandas as pd
y_test = [1, 0, 0]
y_pred = [1, 0, 0]
confusion_matrix=cm(y_test, y_pred)
list1 = ["Actual 0", "Actual 1"]
list2 = ["Predicted 0", "Predicted 1"]
pd.DataFrame(confusion_matrix, list1,list2)
Approach 2: Multiclass Classification
While sklearn.metrics.confusion_matrix provides a numeric matrix, you can generate a 'report' using the following:
import pandas as pd
y_true = pd.Series([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2])
y_pred = pd.Series([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2])
pd.crosstab(y_true, y_pred, rownames=['True'], colnames=['Predicted'], margins=True)
which results in:
Predicted 0 1 2 All
True
0 3 0 0 3
1 0 1 2 3
2 2 1 3 6
All 5 2 5 12
This allows us to see that:
y_true and y_pred, from the "All" subtotalsThis method also works for text labels, and for a large number of samples in the dataset can be extended to provide percentage reports.