Plotting mean ROC curve for multiple ROC curves, R

試著忘記壹切 提交于 2020-01-02 12:47:09

问题


I have a dataset of 100 samples, each of which has 195 mutations with their corresponding known clinical significance ("RealClass") and predicted value according to some prediction tool ("PredictionValues")

For the demonstration, this is a random dataset that has the same structure as my dataset:

predictions_100_samples<-as.data.frame(matrix(nrow=19500,ncol=3))
colnames(predictions_100_samples)<-c("Sample","PredictionValues","RealClass")
predictions_100_samples$Sample<-rep(c(1:100), each = 195)
predictions_100_samples$PredictionValues<-sample(seq(0,1,length.out=19500))
predictions_100_samples$RealClass<-rep(c("pathogenic","benign"),each=10)
colours_for_ROC_curves<-rainbow(n=100)

I plotted all of those 100 sample as ROC curves via PROC package:

library("pROC")
roc_both <- plot(roc(predictor=predictions_100_samples[1:195,2],response = predictions_100_samples[1:195,3]), col = colours_for_ROC_curves[1],main="100 samples ROC curves",legacy.axes=TRUE,lwd=1)
i=2
for(i in 1:100){
    set.seed(500)
    roc_both <- plot(roc(predictor=predictions_100_samples[(((i-1)*195)+1):(i*195),2],response = predictions_100_samples[(((i-1)*195)+1):(i*195),3]), col = colours_for_ROC_curves[i], add = TRUE,lwd=1)
                     i=i+1
}

And that is how the final plot looks like:

Now, I want to add the mean ROC curve of all 100 plotted ROC curves to the same plot. I tried to use the sensitivities and specificities calculated for each threshold via "roc" function along the loop I wrote (It can be achived by roc_both$sensitivities, roc_both$specificities, roc_both$thresholds)

But the main problem was that the chosen thresholds were random and not equal along the 100 ROC curves I plotted, so I could'nt calculate the mean ROC curve manually.

Is there a different package that may allow me to produce the mean ROC curves of multiple ROC curves? Or is there a package that allows setting the thresholds for calculating sensitivity and specificity manually, so I could later on be able to calculate the mean ROC curve? Do you maybe have a different solution for my problem?

Thank you !


回答1:


You can use cutpointr for specifying the thresholds manually via the oc_manual function. I altered the data generation a bit so that the ROC curve looks a little nicer.

We apply the same sequence of thresholds to all samples and take the mean of the sensitivity and specificity per threshold to get the "mean ROC curve".

predictions_100_samples <- data.frame(
    Sample = rep(c(1:100), times = 195),
    PredictionValues = c(rnorm(n = 9750), rnorm(n = 9750, mean = 1)),
    RealClass = c(rep("benign", times = 9750), rep("pathogenic", times = 9750))
)

library(cutpointr)
library(tidyverse)
mean_roc <- function(data, cutoffs = seq(from = -5, to = 5, by = 0.5)) {
    map_df(cutoffs, function(cp) {
        out <- cutpointr(data = data, x = PredictionValues, class = RealClass,
                         subgroup = Sample, method = oc_manual, cutpoint = cp,
                         pos_class = "pathogenic", direction = ">=")
        data.frame(cutoff = cp, 
                   sensitivity = mean(out$sensitivity),
                   specificity = mean(out$specificity))
    })
}

mr <- mean_roc(predictions_100_samples)
ggplot(mr, aes(x = 1 - specificity, y = sensitivity)) + 
    geom_step() + geom_point() +
    theme(aspect.ratio = 1)

You can plot the separate ROC curves and the added mean ROC curve with cutpointr this way:

cutpointr(data = predictions_100_samples, 
          x = PredictionValues, class = RealClass, subgroup = Sample,
          pos_class = "pathogenic", direction = ">=") %>% 
    plot_roc(display_cutpoint = F) + theme(legend.position="none") +
    geom_line(data = mr, mapping = aes(x = 1 - specificity, y = sensitivity), 
              color = "black")

Alternatively, you may want to look into the theory on summary ROC curves (SROC) for fitting a parametric model that combines multiple ROC curves.



来源:https://stackoverflow.com/questions/52467915/plotting-mean-roc-curve-for-multiple-roc-curves-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!