R: Plotting ROC curves without the “PEC” library

问题

I am working with a computer that does not have internet access or a USB port - I only have R with limited libraries (e.g. I do not have access to the "pec" library). I am following a github tutorial where a "survival analysis" statistical model is fit on some data, and then a ROC (Receiver Operator Curve) is plotted to measure the performance of the model.

I am following this tutorial over here: https://gist.github.com/thomasmooon/6eb87964ea663f4a7441cc2b2b730bd4

Everything runs perfectly on my personal computer - however, on my work computer I am unable to download the "pec" library. This means, I can not plot the ROC. I do however have libraries such as "caret", "dplyr", "ggplot2". I was thinking that it might be possible to plot the ROC using other libraries in R such as base R graphics and ggplot2.

Here is the first part of the code - this seems to run perfectly without installing the "pec" library:

# Content

This script briefly introduces training and prediction of a random survival forest
utilizing the package "ranger" on a toy dataset. Beforehand, this toy dataset is
sliced in n-many folds for n-fold crossvalidation. Finally the prediction performance
is measured using the concordance index.


1. data
 + set up toy dataset
 + some descriptive statistics
 + set up a cross validation schema

2. model, training
 + define RSF model + parameters
 + train model on each cross validation subsample
 + compute predictions on the hold out test set

3. prediction performance
 + cindex
 + prediction error curves


# setup
```{r}
rm(list=ls())
library(ranger) # RSF 
library(survival) # contains survival examples, handle survival objects
library(caret) # for stratified cross validiation
library(dplyr) # data manipulation
# library(pec) # I DO NOT HAVE ACCESS TO THIS LIBRARY
```

track session info
```{r session info}
sessionInfo()
```


# 1. data

set up toy dataset
```{r}
data <- survival::cancer # NCCTG Lung Cancer Data,  censoring status 1=censored, 2=dead
str(data)
```

preprocessing
```{r}
# change label of status variable
data <- data %>% mutate(status = status-1) # 0 = censored, 1 = dead
# some data contain missing values, for simplification I omit observations with NA's
data <- na.omit(data)
# scale to months
data$time <- floor(data$time/30)
```


some descriptive statistics
```{r fig.height=10, fig.width=10}
pairs(data %>% select(-time,-status), main = "NCCTG Lung Cancer Data")
```

set up a cross validation schema
```{r}
# cross validation, stratified on status variable to ensure that  each group (here censored, dead)
# is equally distributed over the cross-validation folds
folds <- 2 # for <nfold> cross-validation
cvIndex <- createFolds(factor(data$status), folds, returnTrain = T)
```


# 2 model, training
```{r}
# create some containers to store results
# (not reasonable for big models, for big models you may want so store intermediate results on disk)
container_model <- vector("list",length(cvIndex))
container_pred <- container_model
# run RSF with default params --------------------------------------------------
# iterate through cv-folds
for(i in 1:length(cvIndex)) {
  
  # define training / test data
  train_data <- data[cvIndex[[i]],]
  eval_data <- data[-cvIndex[[i]],]
  
  # train
  rsf <- ranger(Surv(time = time, event = status) ~ ., data = train_data)
  
  # predict (on hold out test set)
  pred <- predict(rsf, eval_data)
  
  # store results
  container_model[[i]] <- rsf
  container_pred[[i]] <- pred
}
```

show some model summaries
```{r}
container_model
```

Now, this is the part of the code that does not run:

# 3. prediction performance
```{r}
container_pec <- vector("list",length(cvIndex))
for(i in 1:length(cvIndex)) {
  # define training / test data
  train_data <- data[cvIndex[[i]], ]
  eval_data <- data[-cvIndex[[i]], ]
  
  # adapt w.r.t. overlaping timepoints
  times <- intersect(train_data$time,eval_data$time)
  times_overlap <- which(container_pred[[i]]$unique.death.times %in% times)
  
  model <- list("rsf"=container_pred[[i]]$survival[,times_overlap])
  
   pec_ <-  pec(
    object = model,
    formula = Surv(time = time, event = status) ~ 1, # Kaplan-Meier
    traindata = train_data,
    data = eval_data,
    exact = F,
    times <- times
    )
   
  # store results in container
  container_pec[[i]] <- pec_
}
# some summary stats
lapply(container_pec,crps)
# plot
lapply(container_pec, plot)
```

Does anyone know if it is possible to run the above code without using the "pec" library? I have the libraries "ROCR" and "mlmetrics".

Thanks

来源：https://stackoverflow.com/questions/65137064/r-plotting-roc-curves-without-the-pec-library

标签

ggplot2

plot

data-visualization