Sklearn - plotting classification report gives a different output than basic avg?

安稳与你 提交于 2020-05-15 21:23:13

问题


I wanted to leverage this answer How to plot scikit learn classification report? turning an sklearn classification report into a heatmap.

It's all working with their sample report, however my classification report looks slightly different and is thus screwing up the functions.

Their report (notice the avg / total):

sampleClassificationReport =             
                   precision    recall  f1-score   support

          Acacia        0.62      1.00      0.76        66
          Blossom       0.93      0.93      0.93        40
          Camellia      0.59      0.97      0.73        67
          Daisy         0.47      0.92      0.62       272
          Echium        1.00      0.16      0.28       413

        avg / total     0.77      0.57      0.49       858

My report with metrics.classification_report(valid_y, y_pred) :

              precision    recall  f1-score   support

           0       1.00      0.18      0.31        11
           1       0.00      0.00      0.00        14
           2       0.00      0.00      0.00        19
           3       0.50      0.77      0.61        66
           4       0.39      0.64      0.49        47
           5       0.00      0.00      0.00        23

    accuracy                           0.46       180
   macro avg       0.32      0.27      0.23       180
weighted avg       0.35      0.46      0.37       180

The issue, from the selected answer in the heatmap link, is here:

for line in lines[2 : (len(lines) - 2)]:
    t = line.strip().split()
    if len(t) < 2: continue
    classes.append(t[0])
    v = [float(x) for x in t[1: len(t) - 1]]
    support.append(int(t[-1]))
    class_names.append(t[0])
    print(v)
    plotMat.append(v)

Because I get the error:

ValueError: could not convert string to float: 'avg'

So the problem truly is how my classification report is being outputted. What can I change here to match the sample?

EDIT: what Ive tried:

df = pd.DataFrame(metrics.classification_report(valid_y, y_pred)).T

df['support'] = df.support.apply(int)

df.style.background_gradient(cmap='viridis',
                             subset=pd.IndexSlice['0':'9', :'f1-score'])

Error:

ValueError: DataFrame constructor not properly called!


回答1:


With the advent of output_dict param in classification_report, there is no hassle for parsing the report. You can directly use the output of classification report to be read as pd.DataFrame. Then, you could use the pd.Style option to render the heat map.

Example:

from sklearn.metrics import classification_report
import numpy as np
import pandas as pd

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV


X, y = make_classification(n_samples=1000, n_features=30,
                           n_informative=12,
                           n_clusters_per_class=1, n_classes=10,
                           class_sep=2.0, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, stratify=y)


clf = LogisticRegression(max_iter=1000, random_state=42).fit(X_train, y_train)



df = pd.DataFrame(classification_report(clf.predict(X_test), 
                                        y_test, digits=2,
                                        output_dict=True)).T

df['support'] = df.support.apply(int)

df.style.background_gradient(cmap='viridis',
                             subset=pd.IndexSlice['0':'9', :'f1-score'])



来源:https://stackoverflow.com/questions/61705257/sklearn-plotting-classification-report-gives-a-different-output-than-basic-avg

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!