Calculate AUC in R?

前端 未结 10 1261
感动是毒
感动是毒 2020-12-07 09:45

Given a vector of scores and a vector of actual class labels, how do you calculate a single-number AUC metric for a binary classifier in the R language or in simple English?

10条回答
  •  攒了一身酷
    2020-12-07 10:16

    You can learn more about AUROC in this blog post by Miron Kursa:

    https://mbq.me/blog/augh-roc/

    He provides a fast function for AUROC:

    # By Miron Kursa https://mbq.me
    auroc <- function(score, bool) {
      n1 <- sum(!bool)
      n2 <- sum(bool)
      U  <- sum(rank(score)[!bool]) - n1 * (n1 + 1) / 2
      return(1 - U / n1 / n2)
    }
    

    Let's test it:

    set.seed(42)
    score <- rnorm(1e3)
    bool  <- sample(c(TRUE, FALSE), 1e3, replace = TRUE)
    
    pROC::auc(bool, score)
    mltools::auc_roc(score, bool)
    ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values[[1]]
    auroc(score, bool)
    
    0.51371668847094
    0.51371668847094
    0.51371668847094
    0.51371668847094
    

    auroc() is 100 times faster than pROC::auc() and computeAUC().

    auroc() is 10 times faster than mltools::auc_roc() and ROCR::performance().

    print(microbenchmark(
      pROC::auc(bool, score),
      computeAUC(score[bool], score[!bool]),
      mltools::auc_roc(score, bool),
      ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values,
      auroc(score, bool)
    ))
    
    Unit: microseconds
                                                                 expr       min
                                               pROC::auc(bool, score) 21000.146
                                computeAUC(score[bool], score[!bool]) 11878.605
                                        mltools::auc_roc(score, bool)  5750.651
     ROCR::performance(ROCR::prediction(score, bool), "auc")@y.values  2899.573
                                                   auroc(score, bool)   236.531
             lq       mean     median        uq        max neval  cld
     22005.3350 23738.3447 22206.5730 22710.853  32628.347   100    d
     12323.0305 16173.0645 12378.5540 12624.981 233701.511   100   c 
      6186.0245  6495.5158  6325.3955  6573.993  14698.244   100  b  
      3019.6310  3300.1961  3068.0240  3237.534  11995.667   100 ab  
       245.4755   253.1109   251.8505   257.578    300.506   100 a   
    

提交回复
热议问题