I am creating a pipeline in scikit learn,
pipeline = Pipeline([
(\'bow\', CountVectorizer()),
(\'classifier\', BernoulliNB()),
])
a
I am new to machine learning. If I understand correctly, the confusion matrix can obtain from 4 value, which are TP, FN, FP and TN. Those 4 value cannot obtain directly from scoring, but it is implied in accuracy, precision and recall.
Now it has 4 unknown TP, FN, FP and TN.
Eq1 : tp/(tp+fp)=P
Eq2 : tp/(tp+fn)=R
Eq3 : (tp+tn)/(tp+fn+fp+tn)=A
[1]: https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Btp%7D%7Btp%2Bfp%7D%3DP
[2]: https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Btp%7D%7Btp%2Bfn%7D%3DR
[3]: https://chart.googleapis.com/chart?cht=tx&chl=%5Cfrac%7Btp%2Btn%7D%7Btp%2Bfn%2Bfp%2Btn%7D%3DA
Assuming one of the unknown is 1, then it becomes 3 unknown and 3 equations. The relative value can be solved using system of equation.
P R A can obtain from scoring
cross_validate can get all 3 source at one time
def calculate_confusion_matrix_by_assume_tp_equal_to_1(r, p, a):
# tp/(tp+fp)=P, tp/(tp+fn)=R, (tp+tn)/(tp+fn+fp+tn)=A
fn = (1 / r) - 1
fp = (1 / p) - 1
tn = (1 - a - a * fn - a * fp) / (a - 1)
return fn, fp, tn