r-caret | 易学教程

R - caret createDataPartition returns more samples than expected

阅读更多关于 R - caret createDataPartition returns more samples than expected

问题 I'm trying to split the iris dataset into a training set and a test set. I used createDataPartition() like this: library(caret) createDataPartition(iris$Species, p=0.1) # [1] 12 22 26 41 42 57 63 79 89 93 114 117 134 137 142 createDataPartition(iris$Sepal.Length, p=0.1) # [1] 1 27 44 46 54 68 72 77 83 84 93 99 104 109 117 132 134 I understand the first query. I have a vector of 0.1*150 elements (150 is the number of samples in the dataset). However, I should have the same vector on the second

R - caret createDataPartition returns more samples than expected

阅读更多关于 R - caret createDataPartition returns more samples than expected

user defined summaryFunction in caret, logloss

阅读更多关于 user defined summaryFunction in caret, logloss

问题 Using the caret package, I am having trouble getting the following user defined summary function to work. It is supposed to calculate the logloss, but I keep getting that logloss is not found. Below, a reproducible example: data <- data.frame('target' = sample(c('Y','N'),100,replace = T), 'X1' = runif(100), 'X2' = runif(100)) log.loss2 <- function(data, lev = NULL, model = NULL) { logloss = -sum(data$obs*log(data$Y) + (1-data$obs)*log(1-data$Y))/length(data$obs) names(logloss) <- c('LL')

calculate PPV and NPV during model training with caret

阅读更多关于 calculate PPV and NPV during model training with caret

问题 I am using the caret package to train models for a classification problem. I know that defaultSummary can be used to calculate Accuracy/Kappa (and SDs), and twoClassSummary will calculate Sens/Spec. I would like to also calculate positive and negative predictive values (PPV/NPV, and SDs) as easily as possible, in the same fashion. I have come up with a solution, but wonder if anyone could confirm that the solution appears reasonable. First, I generate the predictive values: predictiveValues <

Parallel processing within a function with caret model

阅读更多关于 Parallel processing within a function with caret model

问题 I am attempting to create an all in one parallel processing caret function for training caret models with different inputs. I want the function to be its own process independant of all other calls. The function that I have developed so far seems to be reproducible with some models and not with others. For example, below I train a gbm on the iris data set = fail to reproduce. Then train a rpart model = reproduce (aside from time difference). Is my function sound? Is it okay to specify the

preprocess within cross-validation in caret

阅读更多关于 preprocess within cross-validation in caret

问题 I have a question about data preprocess that need to be clarified. To my understanding, when we tune hyperparameters and estimate model performance via cross-validation, rather than preprocess the whole dataset, we need to do that within cross-validation. In other words, in cross-validation, we preprocess training folds, then use the same preprocess parameter to process test fold and make predictions. In the example code below, when I specify the preProcess within caret::train, does it

Using caret to optimize for deviance with binary classification

阅读更多关于 Using caret to optimize for deviance with binary classification

问题 (example borrowed from Fatal error with train() in caret on Windows 7, R 3.0.2, caret 6.0-21) I have this example: library("AppliedPredictiveModeling") library("caret") data("AlzheimerDisease") data <- data.frame(predictors, diagnosis) tuneGrid <- expand.grid(interaction.depth = 1:2, n.trees = 100, shrinkage = 0.1) trainControl <- trainControl(method = "cv", number = 5, verboseIter = TRUE) gbmFit <- train(diagnosis ~ ., data = data, method = "gbm", trControl = trainControl, tuneGrid =

Using caret to optimize for deviance with binary classification

阅读更多关于 Using caret to optimize for deviance with binary classification

Extracting values and plot Box-plot from forecast objects

阅读更多关于 Extracting values and plot Box-plot from forecast objects

问题 I made some forecast with forecast package with several models.Example of this models you can see below: # CODE library(fpp2) # required for the data library(dplyr) library(forecast) #HOLT WINTER fc <- hw(subset(hyndsight,end=length(hyndsight)-35), damped = TRUE, seasonal="multiplicative", h=35) autoplot(hyndsight) + autolayer(fc, series="HW multi damped", PI=FALSE)+ guides(colour=guide_legend(title="Daily forecasts")) #ETS ets_f <- forecast(subset(hyndsight,end=length(hyndsight)-35), , h=35)

Caret: There were missing values in resampled performance measures

阅读更多关于 Caret: There were missing values in resampled performance measures

问题 I am running caret's neural network on the Bike Sharing dataset and I get the following error message: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. I am not sure what the problem is. Can anyone help please? The dataset is from: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset Here is the coding: library(caret) library(bestNormalize) data_hour = read.csv("hour.csv") # Split dataset set.seed(3)