Difference in average AUC computation using ROCR and pROC (R)

问题

I am working with cross-validation data (10-fold repeated 5 times) from a SVM-RFE model generated with the caret package. I know that caret package works with pROC package when computing metrics but I need to use ROCR package in order to obtain the average ROC. However, I noticed that the average AUC values were not the same when using each package, so I am not sure if I should use both packages indistinctively.

The code I used to prove that is:

predictions_NG3<-list()
labels_NG3<-list()

optSize <- svmRFE_NG3$optsize

resamples<-(split(svmRFE_NG3$pred,svmRFE_NG3$pred$Variables))
resamplesFOLD<-(split(resamples[[optSize]],resamples[[optSize]]$Resample))

auc_pROC <- vector()
auc_ROCR <- vector()

for (i in 1:50){
  predictions_NG3[[i]]<-resamplesFOLD[[i]]$LUNG
  labels_NG3[[i]]<-resamplesFOLD[[i]]$obs

  #WITH pROC
  rocCurve <- roc(response = labels_NG3[[i]],
                  predictor = predictions_NG3[[i]],
                  levels = c("BREAST","LUNG")) #LUNG POSITIVE

  auc_pROC <- c(auc_pROC,auc(rocCurve))

  #WITH ROCR
  pred_ROCR <- prediction(predictions_NG3[[i]], labels_NG3[[i]],
                          label.ordering = c("BREAST","LUNG")) #LUNG POSITIVE

  auc_ROCR <- c(auc_ROCR,performance(pred_ROCR,"auc")@y.values[[1]])

}

auc_mean_pROC <- mean(auc_pROC)
auc_sd_pROC <- sd(auc_pROC)
auc_mean_ROCR <- mean(auc_ROCR)
auc_sd_ROCR <- sd(auc_ROCR)

The results are slightly different:

  auc_mean_pROC auc_sd_pROC auc_mean_ROCR auc_sd_ROCR
1     0.8755556   0.1524801     0.8488889   0.2072751

I noticed that the average AUC computation is giving me different results in many cases, like in [5], [22] and [25]:

> auc_pROC
 [1] 0.8333333 0.8333333 1.0000000 1.0000000 0.6666667 0.8333333 0.3333333 0.8333333 1.0000000 1.0000000 1.0000000 1.0000000
[13] 0.8333333 0.5000000 0.8888889 1.0000000 1.0000000 1.0000000 0.8333333 0.8333333 0.8333333 0.6666667 0.6666667 0.8888889
[25] 0.8333333 0.6666667 1.0000000 0.6666667 1.0000000 0.6666667 1.0000000 1.0000000 0.8333333 0.8333333 0.8333333 1.0000000
[37] 0.8333333 1.0000000 0.8333333 1.0000000 0.8333333 1.0000000 1.0000000 0.6666667 1.0000000 1.0000000 1.0000000 1.0000000
[49] 1.0000000 1.0000000
> auc_ROCR
 [1] 0.8333333 0.8333333 1.0000000 1.0000000 0.3333333 0.8333333 0.3333333 0.8333333 1.0000000 1.0000000 1.0000000 1.0000000
[13] 0.8333333 0.5000000 0.8888889 1.0000000 1.0000000 1.0000000 0.8333333 0.8333333 0.8333333 0.3333333 0.6666667 0.8888889
[25] 0.1666667 0.6666667 1.0000000 0.6666667 1.0000000 0.6666667 1.0000000 1.0000000 0.8333333 0.8333333 0.8333333 1.0000000
[37] 0.8333333 1.0000000 0.8333333 1.0000000 0.8333333 1.0000000 1.0000000 0.6666667 1.0000000 1.0000000 1.0000000 1.0000000
[49] 1.0000000 1.0000000

I have tried with other SVM-RFE models, but the problem remains. Why is this happening? Am I doing something wrong?

回答1:

By default, the roc function in pROC attempts to detect what is the response level of control and case observations (you overrode the defaults by setting the levels argument) and whether the controls should have higher or lower values than the cases. You haven't used a direction argument to set the latter.

When you resample your data, this auto-detection will happen for every sample. And if your sample size is low, or your AUC close to 0.5, it can and will happen that some ROC curves will be generated with the opposite direction, biasing your average towards higher values.

Therefore you should always set the direction argument explicitly when you resample ROC curves or similar, for instance:

rocCurve <- roc(response = labels_NG3[[i]],
                predictor = predictions_NG3[[i]],
                direction = "<",
                levels = c("BREAST","LUNG"))

来源：https://stackoverflow.com/questions/37252317/difference-in-average-auc-computation-using-rocr-and-proc-r

标签

r-caret

roc

auc

proc-r-package