r-caret

R - caret createDataPartition returns more samples than expected

徘徊边缘 提交于 2020-07-18 20:09:31
问题 I'm trying to split the iris dataset into a training set and a test set. I used createDataPartition() like this: library(caret) createDataPartition(iris$Species, p=0.1) # [1] 12 22 26 41 42 57 63 79 89 93 114 117 134 137 142 createDataPartition(iris$Sepal.Length, p=0.1) # [1] 1 27 44 46 54 68 72 77 83 84 93 99 104 109 117 132 134 I understand the first query. I have a vector of 0.1*150 elements (150 is the number of samples in the dataset). However, I should have the same vector on the second

R - caret createDataPartition returns more samples than expected

此生再无相见时 提交于 2020-07-18 20:09:18
问题 I'm trying to split the iris dataset into a training set and a test set. I used createDataPartition() like this: library(caret) createDataPartition(iris$Species, p=0.1) # [1] 12 22 26 41 42 57 63 79 89 93 114 117 134 137 142 createDataPartition(iris$Sepal.Length, p=0.1) # [1] 1 27 44 46 54 68 72 77 83 84 93 99 104 109 117 132 134 I understand the first query. I have a vector of 0.1*150 elements (150 is the number of samples in the dataset). However, I should have the same vector on the second

user defined summaryFunction in caret, logloss

旧时模样 提交于 2020-06-28 05:44:10
问题 Using the caret package, I am having trouble getting the following user defined summary function to work. It is supposed to calculate the logloss, but I keep getting that logloss is not found. Below, a reproducible example: data <- data.frame('target' = sample(c('Y','N'),100,replace = T), 'X1' = runif(100), 'X2' = runif(100)) log.loss2 <- function(data, lev = NULL, model = NULL) { logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs) names(logloss) <- c('LL')

calculate PPV and NPV during model training with caret

萝らか妹 提交于 2020-05-28 03:20:28
问题 I am using the caret package to train models for a classification problem. I know that defaultSummary can be used to calculate Accuracy/Kappa (and SDs), and twoClassSummary will calculate Sens/Spec. I would like to also calculate positive and negative predictive values (PPV/NPV, and SDs) as easily as possible, in the same fashion. I have come up with a solution, but wonder if anyone could confirm that the solution appears reasonable. First, I generate the predictive values: predictiveValues <

Parallel processing within a function with caret model

对着背影说爱祢 提交于 2020-05-17 08:47:00
问题 I am attempting to create an all in one parallel processing caret function for training caret models with different inputs. I want the function to be its own process independant of all other calls. The function that I have developed so far seems to be reproducible with some models and not with others. For example, below I train a gbm on the iris data set = fail to reproduce. Then train a rpart model = reproduce (aside from time difference). Is my function sound? Is it okay to specify the

preprocess within cross-validation in caret

╄→尐↘猪︶ㄣ 提交于 2020-05-13 06:22:13
问题 I have a question about data preprocess that need to be clarified. To my understanding, when we tune hyperparameters and estimate model performance via cross-validation, rather than preprocess the whole dataset, we need to do that within cross-validation. In other words, in cross-validation, we preprocess training folds, then use the same preprocess parameter to process test fold and make predictions. In the example code below, when I specify the preProcess within caret::train, does it

Using caret to optimize for deviance with binary classification

与世无争的帅哥 提交于 2020-05-12 05:17:56
问题 (example borrowed from Fatal error with train() in caret on Windows 7, R 3.0.2, caret 6.0-21) I have this example: library("AppliedPredictiveModeling") library("caret") data("AlzheimerDisease") data <- data.frame(predictors, diagnosis) tuneGrid <- expand.grid(interaction.depth = 1:2, n.trees = 100, shrinkage = 0.1) trainControl <- trainControl(method = "cv", number = 5, verboseIter = TRUE) gbmFit <- train(diagnosis ~ ., data = data, method = "gbm", trControl = trainControl, tuneGrid =

Using caret to optimize for deviance with binary classification

你。 提交于 2020-05-12 05:15:51
问题 (example borrowed from Fatal error with train() in caret on Windows 7, R 3.0.2, caret 6.0-21) I have this example: library("AppliedPredictiveModeling") library("caret") data("AlzheimerDisease") data <- data.frame(predictors, diagnosis) tuneGrid <- expand.grid(interaction.depth = 1:2, n.trees = 100, shrinkage = 0.1) trainControl <- trainControl(method = "cv", number = 5, verboseIter = TRUE) gbmFit <- train(diagnosis ~ ., data = data, method = "gbm", trControl = trainControl, tuneGrid =

Extracting values and plot Box-plot from forecast objects

◇◆丶佛笑我妖孽 提交于 2020-04-18 06:11:59
问题 I made some forecast with forecast package with several models.Example of this models you can see below: # CODE library(fpp2) # required for the data library(dplyr) library(forecast) #HOLT WINTER fc <- hw(subset(hyndsight,end=length(hyndsight)-35), damped = TRUE, seasonal="multiplicative", h=35) autoplot(hyndsight) + autolayer(fc, series="HW multi damped", PI=FALSE)+ guides(colour=guide_legend(title="Daily forecasts")) #ETS ets_f <- forecast(subset(hyndsight,end=length(hyndsight)-35), , h=35)

Caret: There were missing values in resampled performance measures

﹥>﹥吖頭↗ 提交于 2020-04-08 18:28:34
问题 I am running caret's neural network on the Bike Sharing dataset and I get the following error message: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. I am not sure what the problem is. Can anyone help please? The dataset is from: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset Here is the coding: library(caret) library(bestNormalize) data_hour = read.csv("hour.csv") # Split dataset set.seed(3)