I used caret for logistic regression in R:
ctrl <- trainControl(method = \"repeatedcv\", number = 10, repeats = 10,
savePredicti
Caret already has summary functions to output all the metrics you mention:
defaultSummary
outputs Accuracy and Kappa
twoClassSummary
outputs AUC (area under the ROC curve - see last line of answer), sensitivity and specificity
prSummary
outputs precision and recall
in order to get combined metrics you can write your own summary function which combines the outputs of these three:
library(caret)
MySummary <- function(data, lev = NULL, model = NULL){
a1 <- defaultSummary(data, lev, model)
b1 <- twoClassSummary(data, lev, model)
c1 <- prSummary(data, lev, model)
out <- c(a1, b1, c1)
out}
lets try on the Sonar data set:
library(mlbench)
data("Sonar")
when defining the train control it is important to set classProbs = TRUE
since some of these metrics (ROC and prAUC) can not be calculated based on predicted class but based on the predicted probabilities.
ctrl <- trainControl(method = "repeatedcv",
number = 10,
savePredictions = TRUE,
summaryFunction = MySummary,
classProbs = TRUE)
Now fit the model of your choice:
mod_fit <- train(Class ~.,
data = Sonar,
method = "rf",
trControl = ctrl)
mod_fit$results
#output
mtry Accuracy Kappa ROC Sens Spec AUC Precision Recall F AccuracySD KappaSD
1 2 0.8364069 0.6666364 0.9454798 0.9280303 0.7333333 0.8683726 0.8121087 0.9280303 0.8621526 0.10570484 0.2162077
2 31 0.8179870 0.6307880 0.9208081 0.8840909 0.7411111 0.8450612 0.8074942 0.8840909 0.8374326 0.06076222 0.1221844
3 60 0.8034632 0.6017979 0.9049242 0.8659091 0.7311111 0.8332068 0.7966889 0.8659091 0.8229330 0.06795824 0.1369086
ROCSD SensSD SpecSD AUCSD PrecisionSD RecallSD FSD
1 0.04393947 0.05727927 0.1948585 0.03410854 0.12717667 0.05727927 0.08482963
2 0.04995650 0.11053858 0.1398657 0.04694993 0.09075782 0.11053858 0.05772388
3 0.04965178 0.12047598 0.1387580 0.04820979 0.08951728 0.12047598 0.06715206
in this output
ROC is in fact the area under the ROC curve - usually called AUC
and
AUC is the area under the precision-recall curve across all cutoffs.