using confusion matrix as scoring metric in cross validation in scikit learn

前端 未结 5 653
野性不改
野性不改 2021-01-31 11:15

I am creating a pipeline in scikit learn,

pipeline = Pipeline([
    (\'bow\', CountVectorizer()),  
    (\'classifier\', BernoulliNB()), 
])

a

5条回答
  •  忘了有多久
    2021-01-31 11:55

    I am new to machine learning. If I understand correctly, the confusion matrix can obtain from 4 value, which are TP, FN, FP and TN. Those 4 value cannot obtain directly from scoring, but it is implied in accuracy, precision and recall.

    Now it has 4 unknown TP, FN, FP and TN.

    Eq1 : tp/(tp+fp)=P

    Eq2 : tp/(tp+fn)=R

    Eq3 : (tp+tn)/(tp+fn+fp+tn)=A

    [1]: https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Btp%7D%7Btp%2Bfp%7D%3DP
    [2]: https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Btp%7D%7Btp%2Bfn%7D%3DR
    [3]: https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Btp%2Btn%7D%7Btp%2Bfn%2Bfp%2Btn%7D%3DA
    

    Assuming one of the unknown is 1, then it becomes 3 unknown and 3 equations. The relative value can be solved using system of equation.

    1. P R A can obtain from scoring

    2. cross_validate can get all 3 source at one time

    def calculate_confusion_matrix_by_assume_tp_equal_to_1(r, p, a):
        # tp/(tp+fp)=P, tp/(tp+fn)=R, (tp+tn)/(tp+fn+fp+tn)=A
        fn = (1 / r) - 1
        fp = (1 / p) - 1
        tn = (1 - a - a * fn - a * fp) / (a - 1)
        return fn, fp, tn
    

提交回复
热议问题