auc

Use TensorFlow loss Global Objectives (recall_at_precision_loss) with Keras (not metrics)

☆樱花仙子☆ 提交于 2020-07-21 03:31:05
问题 Background I have a multi-label classification problem with 5 labels (e.g. [1 0 1 1 0] ). Therefore, I want my model to improve at metrics such as fixed recall, precision-recall AUC or ROC AUC. It doesn't make sense to use a loss function (e.g. binary_crossentropy ) that is not directly related to the performance measurement I want to optimize. Therefore, I want to use TensorFlow's global_objectives.recall_at_precision_loss() or similar as loss function. Relevant GitHub: https://github.com

Different result roc_auc_score and plot_roc_curve

让人想犯罪 __ 提交于 2020-05-31 04:07:02
问题 I am training a RandomForestClassifier (sklearn) to predict credit card fraud. When I then test the model and check the rocauc score i get different values when I use roc_auc_score and plot_roc_curve . roc_auc_score gives me around 0.89 and the plot_curve calculates AUC to 0.96 why is that? The labels are all 0 and 1 as well as the predictions are 0 or 1. CodE: clf = RandomForestClassifier(random_state =42) clf.fit(X_train, y_train[target].values) pred_test = clf.predict(X_test) print(roc_auc

Different result roc_auc_score and plot_roc_curve

[亡魂溺海] 提交于 2020-05-31 04:05:52
问题 I am training a RandomForestClassifier (sklearn) to predict credit card fraud. When I then test the model and check the rocauc score i get different values when I use roc_auc_score and plot_roc_curve . roc_auc_score gives me around 0.89 and the plot_curve calculates AUC to 0.96 why is that? The labels are all 0 and 1 as well as the predictions are 0 or 1. CodE: clf = RandomForestClassifier(random_state =42) clf.fit(X_train, y_train[target].values) pred_test = clf.predict(X_test) print(roc_auc

Why am I getting good accuracy but low ROC AUC for multiple models?

别说谁变了你拦得住时间么 提交于 2020-04-18 05:39:08
问题 My dataset size is 42542 x 14 and I am trying to build different models like logistic regression, KNN, RF, Decision trees and compare the accuracies. I get a high accuracy but low ROC AUC for every model. The data has about 85% samples with target variable = 1 and 15% with target variable 0. I tried taking samples in order to handle this imbalance, but it still gives the same results. Coeffs for glm are as follow: glm(formula = loan_status ~ ., family = "binomial", data = lc_train) Deviance

Comparing AUC, log loss and accuracy scores between models

﹥>﹥吖頭↗ 提交于 2020-04-16 05:08:05
问题 I have the following evaluation metrics on the test set , after running 6 models for a binary classification problem : accuracy logloss AUC 1 19% 0.45 0.54 2 67% 0.62 0.67 3 66% 0.63 0.68 4 67% 0.62 0.66 5 63% 0.61 0.66 6 65% 0.68 0.42 I have the following questions: How can model 1 be the best in terms of logloss (the logloss is the closest to 0) since it performs the worst (in terms of accuracy ). What does that mean ? How come does model 6 have lower AUC score than e.g. model 5 , when

AUC(Area under Curve Roc曲线下面积)计算方法总结

时光总嘲笑我的痴心妄想 提交于 2020-03-24 16:41:11
3 月,跳不动了?>>> 转载至 http://blog.csdn.net/pzy20062141/article/details/48711355 一、roc曲线 1、roc曲线:接收者操作特征(receiveroperating characteristic),roc曲线上每个点反映着对同一信号刺激的感受性。 横轴 :负正类率(false postive rate FPR)特异度,划分实例中所有负例占所有负例的比例;(1-Specificity) 纵轴 :真正类率(true postive rate TPR)灵敏度,Sensitivity(正类覆盖率) 2针对一个二分类问题,将实例分成正类(postive)或者负类(negative)。但是实际中分类时,会出现四种情况. (1)若一个实例是正类并且被预测为正类,即为真正类(True Postive TP) (2)若一个实例是正类,但是被预测成为负类,即为假负类(False Negative FN) (3)若一个实例是负类,但是被预测成为正类,即为假正类(False Postive FP) (4)若一个实例是负类,但是被预测成为负类,即为真负类(True Negative TN) TP :正确的肯定数目 FN :漏报,没有找到正确匹配的数目 FP :误报,没有的匹配不正确 TN :正确拒绝的非匹配数目 列联表如下,1代表正类

AUC计算方法总结

十年热恋 提交于 2020-03-24 16:24:07
3 月,跳不动了?>>> 一、roc曲线 1、roc曲线:接收者操作特征(receiveroperating characteristic),roc曲线上每个点反映着对同一信号刺激的感受性。 横轴 :负正类率(false postive rate FPR)特异度,划分实例中所有负例占所有负例的比例;(1-Specificity) 纵轴 :真正类率(true postive rate TPR)灵敏度,Sensitivity(正类覆盖率) 2针对一个二分类问题,将实例分成正类(postive)或者负类(negative)。但是实际中分类时,会出现四种情况. (1)若一个实例是正类并且被预测为正类,即为真正类(True Postive TP) (2)若一个实例是正类,但是被预测成为负类,即为假负类(False Negative FN) (3)若一个实例是负类,但是被预测成为正类,即为假正类(False Postive FP) (4)若一个实例是负类,但是被预测成为负类,即为真负类(True Negative TN) TP :正确的肯定数目 FN :漏报,没有找到正确匹配的数目 FP :误报,没有的匹配不正确 TN :正确拒绝的非匹配数目 列联表如下,1代表正类,0代表负类: 由上表可得出横,纵轴的计算公式: (1)真正类率(True Postive Rate)TPR: TP/(TP+FN

Comparing multiple AUCs in parallel (R)

こ雲淡風輕ζ 提交于 2020-03-23 07:46:10
问题 I am using the pROC package in r to calculate and compare the AUCs of multiple tests, to see which test has the best ability to discriminate between patients and controls. However, I have a large number of tests and essentially want to run a series of pairwise comparisons of each tests AUC with every other test and then correct for multiple comparisons. This is as far as I've gotten with my code (example with simulated and replicable dataset below): #load pROC library(pROC) #generate df with

简单粗暴理解与实现机器学习之逻辑回归(四):分类评估方法、精确率与召回率、F1-score、分类评估报告api、ROC曲线与AUC指标

送分小仙女□ 提交于 2020-03-04 14:20:53
逻辑回归 文章目录 逻辑回归 学习目标 3.4 分类评估方法 1.分类评估方法 1.1 精确率与召回率 1.1.1 混淆矩阵 1.1.2 精确率(Precision)与召回率(Recall) 1.2 F1-score 1.3 分类评估报告api 2 ROC曲线与AUC指标 2.1 TPR与FPR 2.2 ROC曲线 2.3 AUC指标 2.4 AUC计算API 3 总结 学习目标 知道逻辑回归的损失函数 知道逻辑回归的优化方法 知道sigmoid函数 知道逻辑回归的应用场景 应用LogisticRegression实现逻辑回归预测 知道精确率、召回率指标的区别 知道如何解决样本不均衡情况下的评估 了解ROC曲线的意义说明AUC指标大小 应用classification_report实现精确率、召回率计算 应用roc_auc_score实现指标计算 3.4 分类评估方法 复习:分类评估指标 1.分类评估方法 1.1 精确率与召回率 1.1.1 混淆矩阵 在分类任务下,预测结果(Predicted Condition)与正确标记(True Condition)之间存在四种不同的组合,构成混淆矩阵(适用于多分类) 1.1.2 精确率(Precision)与召回率(Recall) 精确率:预测结果为正例样本中真实为正例的比例(了解) 召回率:真实为正例的样本中预测结果为正例的比例(查得全

简单粗暴理解与实现机器学习之逻辑回归(五):ROC曲线的绘制

余生颓废 提交于 2020-03-04 13:52:59
逻辑回归 文章目录 逻辑回归 学习目标 3.5 ROC曲线的绘制 1 曲线绘制 1.1 如果概率的序列是(1:0.9,2:0.7,3:0.8,4:0.6,5:0.5,6:0.4)。 1.2 如果概率的序列是(1:0.9,2:0.8,3:0.7,4:0.6,5:0.5,6:0.4) 1.3 如果概率的序列是(1:0.4,2:0.6,3:0.5,4:0.7,5:0.8,6:0.9) 2 意义解释 学习目标 知道逻辑回归的损失函数 知道逻辑回归的优化方法 知道sigmoid函数 知道逻辑回归的应用场景 应用LogisticRegression实现逻辑回归预测 知道精确率、召回率指标的区别 知道如何解决样本不均衡情况下的评估 了解ROC曲线的意义说明AUC指标大小 应用classification_report实现精确率、召回率计算 应用roc_auc_score实现指标计算 3.5 ROC曲线的绘制 关于ROC曲线的绘制过程,通过以下举例进行说明 假设有6次展示记录,有两次被点击了,得到一个展示序列(1:1,2:0,3:1,4:0,5:0,6:0),前面的表示序号,后面的表示点击(1)或没有点击(0)。 然后在这6次展示的时候都通过model算出了点击的概率序列。 下面看三种情况。 1 曲线绘制 1.1 如果概率的序列是(1:0.9,2:0.7,3:0.8,4:0.6,5:0.5,6:0