Specifying positive class of an outcome variable in caret train()

允我心安 提交于 2021-02-19 02:29:51

问题


I'm wondering if there is a way to specify which class of the outcome variable is positive in caret's train() function. A minimal example:

# Settings
ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE, summaryFunction = twoClassSummary, classProbs = TRUE)

# Data
data <- mtcars %>% mutate(am = factor(am, levels = c(0,1), labels = c("automatic", "manual"), ordered = T))

# Train
set.seed(123)
model1 <- train(am ~ disp + wt, data = data, method = "glm", family = "binomial", trControl = ctrl, tuneLength = 5)

# Data (factor ordering switched)
data <- mtcars %>% mutate(am = factor(am, levels = c(1,0), labels = c("manual", "automatic"), ordered = T))

# Train
set.seed(123)
model2 <- train(am ~ disp + wt, data = data, method = "glm", family = "binomial", trControl = ctrl, tuneLength = 5)

# Specifity and Sensitivity is switched
model1
model2

If you run the code, you'll notice that Specificity and Sensitivity metrics are "switched" in both models. It looks like the train() function takes the first level of a factor outcome variable as a positive outcome. Is there a way to specify a positive class in the function itself so I will get the same results no matter of the outcome factor ordering? I tried adding positive = "manual" but this results in an error.


回答1:


I believe @Johannes is the example of over-engineering a simple process.

Simply revert the order of your factor:

   df$target <- factor(df$target, levels=rev(levels(df$target)))



回答2:


The issue lies not in function train() but in function twoClassSummary, which looks like this:

function (data, lev = NULL, model = NULL) 
{
  lvls <- levels(data$obs)

  [...]    

  out <- c(rocAUC, 
           sensitivity(data[, "pred"], data[, "obs"], 
             lev[1]),  # Hard coded positive class
           specificity(data[, "pred"], data[, "obs"], 
             lev[2])) # Hard coded negative class
  names(out) <- c("ROC", "Sens", "Spec")
  out
}

The order of the levels in which they are passed to sensitivity() and specificity() is hard-coded here.

As @Seymour points out very correctly, reversing the order of the levels of the outcome variable fixes the issue.

df$target <- factor(df$target, levels=rev(levels(df$target)))

If you are not willing to change the order of levels, there's an unintrusive way to change the twoClassSummary() function.

sensitivity() and specificity() take the positive and negative level name, respectively, (a suboptimal design choice). So we include these two arguments into our custom function. Further down, we pass these arguments to the respective function to fix the problem.

customTwoClassSummary <- function(data, lev = NULL, model = NULL, positive = NULL, negative=NULL) 
{
  lvls <- levels(data$obs)
  if (length(lvls) > 2) 
    stop(paste("Your outcome has", length(lvls), "levels. The twoClassSummary() function isn't appropriate."))
  caret:::requireNamespaceQuietStop("ModelMetrics")
  if (!all(levels(data[, "pred"]) == lvls)) 
    stop("levels of observed and predicted data do not match")
  rocAUC <- ModelMetrics::auc(ifelse(data$obs == lev[2], 0, 
                                     1), data[, lvls[1]])
  out <- c(rocAUC, 
           # Only change happens here!
           sensitivity(data[, "pred"], data[, "obs"], positive=positive), 
           specificity(data[, "pred"], data[, "obs"], negative=negative))
  names(out) <- c("ROC", "Sens", "Spec")
  out
}

But how to specify these options without changing more code within the package? By default caret doesn't pass options to the summary function. We wrap the function up in an anonymous function in the call to trainControl():

ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE, 
                     # This is a trick how to fix arguments for a function call
                     summaryFunction = function(...) customTwoClassSummary(..., 
                                       positive = "manual", negative="automatic"), 
                     classProbs = TRUE)

The ... argument makes sure that all other arguments that caret passes to the anonymous function get passed on to customTwoClassSummary().



来源:https://stackoverflow.com/questions/45333029/specifying-positive-class-of-an-outcome-variable-in-caret-train

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!