r-caret

Caret and KNN in R: predict function gives error

▼魔方 西西 提交于 2019-12-03 23:01:48
问题 I try to predict with a simplified KNN model using the caret package in R. It always gives the same error, even in the very simple reproducible example here: library(caret) set.seed(1) #generate training dataset "a" n = 10000 a = matrix(rnorm(n*8,sd=1000000),nrow = n) y = round(runif(n)) a = cbind(y,a) a = as.data.frame(a) a[,1] = as.factor(a[,1]) colnames(a) = c("y",paste0("V",1:8)) #estimate simple KNN model ctrl <- trainControl(method="none",repeats = 1) knnFit <- train(y ~ ., data = a,

How to implement a hold-out validation in R

二次信任 提交于 2019-12-03 21:52:39
Let's say I'm using the Sonar data and I'd like to make a hold-out validation in R. I partitioned the data using the createFolds from caret package as folds <- createFolds(mydata$Class, k=5) . I would like then to use exactly the fold mydata[i] as test data and train a classifier using mydata[-i] as train data. My first thought was to use the train function, but I couldn't find any support for hold-out validation. Am I missing something here? Also, I'd like to be able to use exactly the pre-defined folds as parameter, instead of letting the function partition the data. Does anyone have any

R caret / rfe variable selection for factors() AND NAs

不羁的心 提交于 2019-12-03 21:45:30
问题 I have a data set with NAs sprinkled generously throughout. In addition it has columns that need to be factors() . I am using the rfe() function from the caret package to select variables. It seems the functions= argument in rfe() using lmFuncs works for the data with NAs but NOT on factor variables, while the rfFuncs works for factor variables but NOT NAs. Any suggestions for dealing with this? I tried model.matrix() but it seems to just cause more problems. 回答1: Because of inconsistent

How to predict on a new dataset using caretEnsemble package in R?

徘徊边缘 提交于 2019-12-03 21:22:48
I am currently using caretEnsemble package in R for combining multiple models trained in caret. I have got the list of final trained models (say model_list ) using caretList function from the same package as follows. model_list <- caretList( x = input_predictors, y = input_labels, metric = 'Accuracy', tuneList = list( randomForestModel = caretModelSpec(method='rf', tuneLength=1, preProcess=c('BoxCox', 'center', 'scale')), ldaModel = caretModelSpec(method='lda', tuneLength=1, preProcess=c('BoxCox', 'center', 'scale')), logisticRegressionModel = caretModelSpec(method='glm', tuneLength=1,

R Confusion Matrix sensitivity and specificity labeling

ⅰ亾dé卋堺 提交于 2019-12-03 17:12:09
I am using R v3.3.2 and Caret 6.0.71 (i.e. latest versions) to construct a logistic regression classifier. I am using the confusionMatrix function to create stats for judging its performance. logRegConfMat <- confusionMatrix(logRegPrediction, valData[,"Seen"]) Reference 0, Prediction 0 = 30 Reference 1, Prediction 0 = 14 Reference 0, Prediction 1 = 60 Reference 1, Prediction 1 = 164 Accuracy : 0.7239 Sensitivity : 0.3333 Specificity : 0.9213 The target value in my data (Seen) uses 1 for true and 0 for false. I assume the Reference (Ground truth) columns and Predication (Classifier) rows in the

Feature Selection in caret rfe + sum with ROC

ぃ、小莉子 提交于 2019-12-03 14:02:55
问题 I have been trying to apply recursive feature selection using caret package. What I need is that ref uses the AUC as performance measure. After googling for a month I cannot get the process working. Here is the code I have used: library(caret) library(doMC) registerDoMC(cores = 4) data(mdrr) subsets <- c(1:10) ctrl <- rfeControl(functions=caretFuncs, method = "cv", repeats =5, number = 10, returnResamp="final", verbose = TRUE) trainctrl <- trainControl(classProbs= TRUE) caretFuncs$summary <-

R: using ranger with caret, tuneGrid argument

℡╲_俬逩灬. 提交于 2019-12-03 13:33:36
问题 I'm using the caret package to analyse Random Forest models built using ranger. I can't figure out how to call the train function using the tuneGrid argument to tune the model parameters. I think I'm calling the tuneGrid argument wrong, but can't figure out why it's wrong. Any help would be appreciated. data(iris) library(ranger) model_ranger <- ranger(Species ~ ., data = iris, num.trees = 500, mtry = 4, importance = 'impurity') library(caret) # my tuneGrid object: tgrid <- expand.grid( num

Caret Package: Stratified Cross Validation in Train Function

回眸只為那壹抹淺笑 提交于 2019-12-03 12:34:39
问题 Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer. Thanks in advance. 回答1: There is a parameter called 'index' which can let user specified the index to do cross validation. folds <- 4 cvIndex <- createFolds(factor(training$Y), folds, returnTrain

Creating folds for k-fold CV in R using Caret

浪子不回头ぞ 提交于 2019-12-03 12:21:51
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 5 years ago . I'm trying to make a k-fold CV for several classification methods/hiperparameters using the data available at http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data. This set is made of 208 rows, each with 60 attributes. I'm reading it into a data.frame using the read.table function. The next step is to split my data into

How to change metrics using the library(caret)?

佐手、 提交于 2019-12-03 12:16:06
I would like to change the metric from RMSE to RMSLE using the caret library Given some sample data: ivar1<-rnorm(500, mean = 3, sd = 1) ivar2<-rnorm(500, mean = 4, sd = 1) ivar3<-rnorm(500, mean = 5, sd = 1) ivar4<-rnorm(500, mean = 4, sd = 1) dvar<-rpois(500, exp(3+ 0.1*ivar1 - 0.25*ivar2)) data<-data.frame(dvar,ivar4,ivar3,ivar2,ivar1) ctrl <- rfeControl(functions=rfFuncs, method="cv", repeats = 5, verbose = FALSE, number=5) model <- rfe(data[,2:4], data[,1], sizes=c(1:4), rfeControl=ctrl) Here I would like to change to RMSLE and keeping the idea of the graph plot <-ggplot(model,type=c("g",