r-caret

Caret returns different predictions with caret train object than it does with the extracted final model

落花浮王杯 提交于 2021-02-19 05:25:47
问题 I prefer to use caret when fitting models because of its relative speed and preprocessing capabilities. However, I'm slightly confused on how it makes predictions. When comparing predictions made directly from the train object and predictions made from the extracted final model, I'm seeing very different numbers. The predictions from the train object appear to be more accurate. library(caret) library(ranger) x1 <- rnorm(100) x2 <- rbeta(100, 1, 1) y <- 2*x1 + x2 + 5*x1*x2 data <- data.frame

Specifying positive class of an outcome variable in caret train()

允我心安 提交于 2021-02-19 02:29:51
问题 I'm wondering if there is a way to specify which class of the outcome variable is positive in caret's train() function. A minimal example: # Settings ctrl <- trainControl(method = "repeatedcv", number = 10, savePredictions = TRUE, summaryFunction = twoClassSummary, classProbs = TRUE) # Data data <- mtcars %>% mutate(am = factor(am, levels = c(0,1), labels = c("automatic", "manual"), ordered = T)) # Train set.seed(123) model1 <- train(am ~ disp + wt, data = data, method = "glm", family =

recipes::step_dummy + caret::train -> Error:Not all variables in the recipe are present

痴心易碎 提交于 2021-02-11 15:21:03
问题 I am getting the following error when using recipes::step_dummy with caret::train (first attempt at combining the two packages): Error: Not all variables in the recipe are present in the supplied training set Not sure what is causing the error nor the best way to debug. Help to train model would be much appreciated. library(caret) library(tidyverse) library(recipes) library(rsample) data("credit_data") ## Split the data into training (75%) and test sets (25%) set.seed(100) train_test_split <-

Error using Caret Package with “knn” method — Something is wrong; all the Accuracy metric values are missing

为君一笑 提交于 2021-02-11 13:07:48
问题 Hi I am using the caret package and training a model with a knn algorithm but I am running into an error. I am using the german credit data and this is the structure of the data frame 'data.frame': 1000 obs. of 21 variables: $ checking_balance : Factor w/ 4 levels "< 0 DM","> 200 DM",..: 1 3 4 1 1 $ months_loan_duration: int 6 48 12 42 24 36 24 36 12 30 ... $ credit_history : Factor w/ 5 levels "critical","delayed",..: 1 5 1 5 $ purpose : Factor w/ 10 levels "business","car (new)",..: 8 8 5 $

Error using Caret Package with “knn” method — Something is wrong; all the Accuracy metric values are missing

。_饼干妹妹 提交于 2021-02-11 12:59:17
问题 Hi I am using the caret package and training a model with a knn algorithm but I am running into an error. I am using the german credit data and this is the structure of the data frame 'data.frame': 1000 obs. of 21 variables: $ checking_balance : Factor w/ 4 levels "< 0 DM","> 200 DM",..: 1 3 4 1 1 $ months_loan_duration: int 6 48 12 42 24 36 24 36 12 30 ... $ credit_history : Factor w/ 5 levels "critical","delayed",..: 1 5 1 5 $ purpose : Factor w/ 10 levels "business","car (new)",..: 8 8 5 $

train,validation, test split model in CARET in R

南笙酒味 提交于 2021-02-11 08:35:59
问题 I would like to ask for help please. I use this code to run the XGboost model in the Caret package. However, I want to use the validation split based on time. I want 60% training, 20% validation ,20% testing. I already split the data, but I do know how to deal with the validation data if it is not cross-validation. Thank you, xgb_trainControl = trainControl( method = "cv", number = 5, returnData = FALSE ) xgb_grid <- expand.grid(nrounds = 1000, eta = 0.01, max_depth = 8, gamma = 1, colsample

train,validation, test split model in CARET in R

霸气de小男生 提交于 2021-02-11 08:35:28
问题 I would like to ask for help please. I use this code to run the XGboost model in the Caret package. However, I want to use the validation split based on time. I want 60% training, 20% validation ,20% testing. I already split the data, but I do know how to deal with the validation data if it is not cross-validation. Thank you, xgb_trainControl = trainControl( method = "cv", number = 5, returnData = FALSE ) xgb_grid <- expand.grid(nrounds = 1000, eta = 0.01, max_depth = 8, gamma = 1, colsample

ggplot2 Heatmap 2 Different Color Schemes - Confusion Matrix: Matches in Different Color Scheme than Missclassifications

自作多情 提交于 2021-02-11 05:54:18
问题 I adapted a heatmap plot for a confusion matrix from this answer. However I would like to twist it. In the diagonal (from top left to bottom right) are the matches (correct classifications). My aim would be, to plot this diagonal in a yellow color palette. And mismatches (so all tiles except those in the diagonal) in a red color palette. In my plot.cm function I can get the diagonal with cm_d$diag <- cm_d$Prediction == cm_d$Reference # Get the Diagonal cm_d$ndiag <- cm_d$Prediction != cm_d

Error when trying to pass custom metric in Caret package

泪湿孤枕 提交于 2021-02-10 19:32:29
问题 Related question - 1 I have a dataset like so: > head(training_data) year month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser 1 2016 October Social 1477775021 1 0 Chrome 2 2016 September Social 1473037945 1 0 Safari 3 2017 July Organic Search 1500305542 1 0 Chrome 4 2017 July Organic Search 1500322111 2 16569 Chrome 5 2016 August Social 1471890172 1 0 Safari 6 2017 May Direct 1495146428 1 0 Chrome operatingSystem isMobile continent subContinent country source medium 1

Error when trying to pass custom metric in Caret package

亡梦爱人 提交于 2021-02-10 19:31:57
问题 Related question - 1 I have a dataset like so: > head(training_data) year month channelGrouping visitStartTime visitNumber timeSinceLastVisit browser 1 2016 October Social 1477775021 1 0 Chrome 2 2016 September Social 1473037945 1 0 Safari 3 2017 July Organic Search 1500305542 1 0 Chrome 4 2017 July Organic Search 1500322111 2 16569 Chrome 5 2016 August Social 1471890172 1 0 Safari 6 2017 May Direct 1495146428 1 0 Chrome operatingSystem isMobile continent subContinent country source medium 1