r-caret | 易学教程

Caret and KNN in R: predict function gives error

阅读更多关于 Caret and KNN in R: predict function gives error

问题 I try to predict with a simplified KNN model using the caret package in R. It always gives the same error, even in the very simple reproducible example here: library(caret) set.seed(1) #generate training dataset "a" n = 10000 a = matrix(rnorm(n*8,sd=1000000),nrow = n) y = round(runif(n)) a = cbind(y,a) a = as.data.frame(a) a[,1] = as.factor(a[,1]) colnames(a) = c("y",paste0("V",1:8)) #estimate simple KNN model ctrl <- trainControl(method="none",repeats = 1) knnFit <- train(y ~ ., data = a,

How to implement a hold-out validation in R

阅读更多关于 How to implement a hold-out validation in R

Let's say I'm using the Sonar data and I'd like to make a hold-out validation in R. I partitioned the data using the createFolds from caret package as folds <- createFolds(mydata$Class, k=5) . I would like then to use exactly the fold mydata[i] as test data and train a classifier using mydata[-i] as train data. My first thought was to use the train function, but I couldn't find any support for hold-out validation. Am I missing something here? Also, I'd like to be able to use exactly the pre-defined folds as parameter, instead of letting the function partition the data. Does anyone have any

R caret / rfe variable selection for factors() AND NAs

阅读更多关于 R caret / rfe variable selection for factors() AND NAs

问题 I have a data set with NAs sprinkled generously throughout. In addition it has columns that need to be factors() . I am using the rfe() function from the caret package to select variables. It seems the functions= argument in rfe() using lmFuncs works for the data with NAs but NOT on factor variables, while the rfFuncs works for factor variables but NOT NAs. Any suggestions for dealing with this? I tried model.matrix() but it seems to just cause more problems. 回答1: Because of inconsistent

How to predict on a new dataset using caretEnsemble package in R?

阅读更多关于 How to predict on a new dataset using caretEnsemble package in R?

I am currently using caretEnsemble package in R for combining multiple models trained in caret. I have got the list of final trained models (say model_list ) using caretList function from the same package as follows. model_list <- caretList( x = input_predictors, y = input_labels, metric = 'Accuracy', tuneList = list( randomForestModel = caretModelSpec(method='rf', tuneLength=1, preProcess=c('BoxCox', 'center', 'scale')), ldaModel = caretModelSpec(method='lda', tuneLength=1, preProcess=c('BoxCox', 'center', 'scale')), logisticRegressionModel = caretModelSpec(method='glm', tuneLength=1,

R Confusion Matrix sensitivity and specificity labeling

阅读更多关于 R Confusion Matrix sensitivity and specificity labeling

I am using R v3.3.2 and Caret 6.0.71 (i.e. latest versions) to construct a logistic regression classifier. I am using the confusionMatrix function to create stats for judging its performance. logRegConfMat <- confusionMatrix(logRegPrediction, valData[,"Seen"]) Reference 0, Prediction 0 = 30 Reference 1, Prediction 0 = 14 Reference 0, Prediction 1 = 60 Reference 1, Prediction 1 = 164 Accuracy : 0.7239 Sensitivity : 0.3333 Specificity : 0.9213 The target value in my data (Seen) uses 1 for true and 0 for false. I assume the Reference (Ground truth) columns and Predication (Classifier) rows in the

Feature Selection in caret rfe + sum with ROC

阅读更多关于 Feature Selection in caret rfe + sum with ROC

问题 I have been trying to apply recursive feature selection using caret package. What I need is that ref uses the AUC as performance measure. After googling for a month I cannot get the process working. Here is the code I have used: library(caret) library(doMC) registerDoMC(cores = 4) data(mdrr) subsets <- c(1:10) ctrl <- rfeControl(functions=caretFuncs, method = "cv", repeats =5, number = 10, returnResamp="final", verbose = TRUE) trainctrl <- trainControl(classProbs= TRUE) caretFuncs$summary <-

R: using ranger with caret, tuneGrid argument

阅读更多关于 R: using ranger with caret, tuneGrid argument

问题 I'm using the caret package to analyse Random Forest models built using ranger. I can't figure out how to call the train function using the tuneGrid argument to tune the model parameters. I think I'm calling the tuneGrid argument wrong, but can't figure out why it's wrong. Any help would be appreciated. data(iris) library(ranger) model_ranger <- ranger(Species ~ ., data = iris, num.trees = 500, mtry = 4, importance = 'impurity') library(caret) # my tuneGrid object: tgrid <- expand.grid( num

Caret Package: Stratified Cross Validation in Train Function

阅读更多关于 Caret Package: Stratified Cross Validation in Train Function

问题 Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer. Thanks in advance. 回答1: There is a parameter called 'index' which can let user specified the index to do cross validation. folds <- 4 cvIndex <- createFolds(factor(training$Y), folds, returnTrain

Creating folds for k-fold CV in R using Caret

阅读更多关于 Creating folds for k-fold CV in R using Caret

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . I'm trying to make a k-fold CV for several classification methods/hiperparameters using the data available at http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data. This set is made of 208 rows, each with 60 attributes. I'm reading it into a data.frame using the read.table function. The next step is to split my data into

How to change metrics using the library(caret)?

阅读更多关于 How to change metrics using the library(caret)?

I would like to change the metric from RMSE to RMSLE using the caret library Given some sample data: ivar1<-rnorm(500, mean = 3, sd = 1) ivar2<-rnorm(500, mean = 4, sd = 1) ivar3<-rnorm(500, mean = 5, sd = 1) ivar4<-rnorm(500, mean = 4, sd = 1) dvar<-rpois(500, exp(3+ 0.1*ivar1 - 0.25*ivar2)) data<-data.frame(dvar,ivar4,ivar3,ivar2,ivar1) ctrl <- rfeControl(functions=rfFuncs, method="cv", repeats = 5, verbose = FALSE, number=5) model <- rfe(data[,2:4], data[,1], sizes=c(1:4), rfeControl=ctrl) Here I would like to change to RMSLE and keeping the idea of the graph plot <-ggplot(model,type=c("g",