How to Create a loop (when levels do not overlap the reference)

|▌冷眼眸甩不掉的悲伤 提交于 2020-12-14 12:11:19

问题


I have written some code in R. This code takes some data and splits it into a training set and a test set. Then, I fit a "survival random forest" model on the training set. After, I use the model to predict observations within the test set.

Due to the type of problem I am dealing with ("survival analysis"), a confusion matrix has to be made for each "unique time" (inside the file "unique.death.time"). For each confusion matrix made for each unique time, I am interested in the corresponding "sensitivity" value (e.g. sensitivity_1001, sensitivity_2005, etc.). I am trying to get all these sensitivity values : I would like to make a plot with them (vs unique death times) and determine the average sensitivity value.

In order to do this, I need to repeatedly calculate the sensitivity for each time point in "unique.death.times". I tried doing this manually and it is taking a long time.

Could someone please show me how to do this with a "loop"?

I have posted my code below:

#load libraries
library(survival)
library(data.table)
library(pec)
library(ranger)
library(caret)

#load data
data(cost)

#split data into train and test
ind <- sample(1:nrow(cost),round(nrow(cost) * 0.7,0))
cost_train <- cost[ind,]
cost_test <- cost[-ind,]

#fit survival random forest model
ranger_fit <- ranger(Surv(time, status) ~ .,
                data = cost_train,
                mtry = 3,
                verbose = TRUE,
                write.forest=TRUE,
                num.trees= 1000,
                importance = 'permutation')

#optional: plot training results
plot(ranger_fit$unique.death.times, ranger_fit$survival[1,], type = 'l', col = 'red')    # for first observation
lines(ranger_fit$unique.death.times, ranger_fit$survival[21,], type = 'l', col = 'blue')  # for twenty first observation

#predict observations test set using the survival random forest model
ranger_preds <- predict(ranger_fit, cost_test, type = 'response')$survival
ranger_preds <- data.table(ranger_preds)
colnames(ranger_preds) <- as.character(ranger_fit$unique.death.times)

From here, another user (Justin Singh) from a previous post (R: how to repeatedly "loop" the results from a function?) suggested how to create a loop:

sensitivity <- list()
for (time in names(ranger_preds)) {
    prediction <- ranger_preds[which(names(ranger_preds) == time)] > 0.5
    real <- cost_test$time >= as.numeric(time)
    confusion <- confusionMatrix(as.factor(prediction), as.factor(real), positive = 'TRUE')
    sensitivity[as.character(i)] <- confusion$byclass[1]
}

But due to some of the observations used in this loop, I get the following error:

Error in confusionMatrix.default(as.factor(prediction), as.factor(real),  : 
  The data must contain some levels that overlap the reference.

Does anyone know how to fix this? Thanks


回答1:


Certain values in prediction and/or real have only 1 unique value in them. Make sure the levels of the factors are the same.

sapply(names(ranger_preds), function(x) {
  prediction <- factor(ranger_preds[[x]] > 0.5, levels = c(TRUE, FALSE)) 
  real <- factor(cost_test$time >= as.numeric(x), levels = c(TRUE, FALSE))
  confusion <- caret::confusionMatrix(prediction, real, positive = 'TRUE')
  confusion$byClass[1]
}, USE.NAMES = FALSE) -> result

result


来源:https://stackoverflow.com/questions/65118371/how-to-create-a-loop-when-levels-do-not-overlap-the-reference

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!