r-caret

getting this error in Caret

99封情书 提交于 2019-11-27 19:30:22
问题 I'm getting the following error and I don't know what may have gone wrong. I'm using R Studio with the 3.1.3 version of R for Windows 8.1 and using the Caret package for datamining. I have the following training data: str(training) 'data.frame': 212300 obs. of 21 variables: $ FL_DATE_MDD_MMDD : int 101 101 101 101 101 101 101 101 101 101 ... $ FL_DATE : int 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 1012013 ... $ UNIQUE_CARRIER : Factor w/ 13 levels "9E","AA","AS"

Error when using predict() on a randomForest object trained with caret's train() using formula

只愿长相守 提交于 2019-11-27 16:21:57
问题 Using R 3.2.0 with caret 6.0-41 and randomForest 4.6-10 on a 64-bit Linux machine. When trying to use the predict() method on a randomForest object trained with the train() function from the caret package using a formula, the function returns an error. When training via randomForest() and/or using x= and y= rather than a formula, it all runs smoothly. Here is a working example: library(randomForest) library(caret) data(imports85) imp85 <- imports85[, c("stroke", "price", "fuelType",

Error in ConfusionMatrix the data and reference factors must have the same number of levels

半世苍凉 提交于 2019-11-27 13:53:00
问题 I've trained a tree model with R caret. I'm now trying to generate a confusion matrix and keep getting the following error: Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels prob <- 0.5 #Specify class split singleSplit <- createDataPartition(modellingData2$category, p=prob, times=1, list=FALSE) cvControl <- trainControl(method="repeatedcv", number=10, repeats=5) traindata <- modellingData2[singleSplit,]

parRF on caret not working for more than one core

感情迁移 提交于 2019-11-27 02:54:44
问题 parRF from the caret R package is not working for me with more than one core, which is quite ironic, given the par in parRF stands for parallel. I'm on a windows machine, if that is a relevant piece of information. I checked that I'm using the latest an greatest regarding caret and doParallel. I made a minimal example and and give the results below. Any ideas? Source code library(caret) library(doParallel) trCtrl <- trainControl( method = "repeatedcv" , number = 2 , repeats = 5 ,

Fully reproducible parallel models using caret

半腔热情 提交于 2019-11-27 02:51:32
When I run 2 random forests in caret, I get the exact same results if I set a random seed: library(caret) library(doParallel) set.seed(42) myControl <- trainControl(method='cv', index=createFolds(iris$Species)) set.seed(42) model1 <- train(Species~., iris, method='rf', trControl=myControl) set.seed(42) model2 <- train(Species~., iris, method='rf', trControl=myControl) > all.equal(predict(model1, type='prob'), predict(model2, type='prob')) [1] TRUE However, if I register a parallel back-end to speed up the modeling, I get a different result each time I run the model: cl <- makeCluster

Error when I try to predict class probabilities in R - caret

﹥>﹥吖頭↗ 提交于 2019-11-27 02:05:06
问题 I've build a model using caret. When the training was completed I got the following warning: Warning message: In train.default(x, y, weights = w, ...) : At least one of the class levels are not valid R variables names; This may cause errors if class probabilities are generated because the variables names will be converted to: X0, X1 The names of the variables are: str(train) 'data.frame': 7395 obs. of 30 variables: $ alchemy_category : Factor w/ 13 levels "arts_entertainment",..: 2 8 6 6 11 6