r-caret

Different results with randomForest() and caret's randomForest (method = “rf”)

青春壹個敷衍的年華 提交于 2019-12-03 03:15:38
I am new to caret, and I just want to ensure that I fully understand what it’s doing. Towards that end, I’ve been attempting to replicate the results I get from a randomForest() model using caret’s train() function for method="rf". Unfortunately, I haven’t been able to get matching results, and I’m wondering what I’m overlooking. I’ll also add that given that randomForest uses bootstrapping to generate samples to fit each of the ntrees, and estimates error based on out-of-bag predictions, I’m a little fuzzy on the difference between specifying "oob" and "boot" in the trainControl function call

Caret Package: Stratified Cross Validation in Train Function

百般思念 提交于 2019-12-03 02:57:46
Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer. Thanks in advance. There is a parameter called 'index' which can let user specified the index to do cross validation. folds <- 4 cvIndex <- createFolds(factor(training$Y), folds, returnTrain = T) tc <- trainControl(index = cvIndex, method = 'cv', number = folds) rfFit <- train(Y ~ ., data =

Difference between varImp (caret) and importance (randomForest) for Random Forest

柔情痞子 提交于 2019-12-03 02:36:21
I do not understand which is the difference between varImp function ( caret package) and importance function ( randomForest package) for a Random Forest model: I computed a simple RF classification model and when computing variable importance, I found that the "ranking" of predictors was not the same for both functions: Here is my code: rfImp <- randomForest(Origin ~ ., data = TAll_CS, ntree = 2000, importance = TRUE) importance(rfImp) BREAST LUNG MeanDecreaseAccuracy MeanDecreaseGini Energy_GLCM_R1SC4NG3 -1.44116806 2.8918537 1.0929302 0.3712622 Contrast_GLCM_R1SC4NG3 -2.61146974 1.5848150 -0

Custom metric (hmeasure) for summaryFunction caret classification

走远了吗. 提交于 2019-12-03 00:47:06
I am trying to use the hmeasure metric Hand,2009 as my custom metric for training SVMs in caret. As I am relatively new to using R, I tried adapt the twoClassSummary function. All I need is to pass the true class labels and predicted class probability from the model (an svm) to the HMeasure function from the hmeasure package instead of using ROC or other measures of classification performance in caret. For example, a call to the HMeasure function in R - HMeasure(true.class,predictedProbs[,2])- results in calculation of the Hmeasure. Using an adaptation of twoClassSummary code below results in

R - convert from categorical to numeric for KNN

孤人 提交于 2019-12-02 21:58:00
问题 I'm trying to use the Caret package of R to use the KNN applied to the "abalone" database from UCI Machine Learning (link to the data). But it doesn't allow to use KNN when there's categorical values. How do I convert the categorical values (in this database: "M","F","I" ) to numeric values, such as 1,2,3 , respectively? 回答1: When data are read in via read.table , the data in the first column are factors. Then data$iGender = as.integer(data$Gender) would work. If they are character, a detour

How to specify minbucket in caret train for?

。_饼干妹妹 提交于 2019-12-02 20:43:56
问题 For CART model, caret seems to only provide tunning of complexity parameter. Is there way to tune other parameters such as minbucket? 回答1: Arguments passed to the classification or regression routines are included in the dots parameter. As you want to include minbucket , parameter control should be included inside train . As example: library("caret") train(Kyphosis ~ Age + Number + Start, data = kyphosis, method = "rpart", tuneGrid = data.frame(cp = c(0.01, 0.05)), control = rpart.control

caret train() predicts very different then predict.glm()

丶灬走出姿态 提交于 2019-12-02 13:55:19
I am trying to estimate a logistic regression, using the 10-fold cross-validation. #import libraries library(car); library(caret); library(e1071); library(verification) #data import and preparation data(Chile) chile <- na.omit(Chile) #remove "na's" chile <- chile[chile$vote == "Y" | chile$vote == "N" , ] #only "Y" and "N" required chile$vote <- factor(chile$vote) #required to remove unwanted levels chile$income <- factor(chile$income) # treat income as a factor Goal is to estimate a glm - model that predicts to outcome of vote "Y" or "N" depended on relevant explanatory variables and, based on

R's caret training errors when y is not a factor

家住魔仙堡 提交于 2019-12-02 04:52:33
问题 I am using R-studio and am using kaggle's forest cover data and keep getting an error when trying to use the knn3 function in caret. here is my code: library(caret) train <- read.csv("C:/data/forest_cover/train.csv", header=T) trainingRows <- createDataPartition(train$Cover_Type, p=0.8, list=F) head(trainingRows) train_train <- train[trainingRows,] train_test <- train[-trainingRows,] knnfit <- knn3(train_train[,-56], train_train$Cover_Type) This last line gives me this in the console: Error

Caret Model random forest into PMML error

感情迁移 提交于 2019-12-02 04:44:44
问题 I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting. data(iris) require(caret) require(pmml) rfGrid2 <- expand.grid(.mtry = c(1,2)) fitControl2 <- trainControl( method = "repeatedcv", number = NUMBER_OF_CV, repeats = REPEATES) model.Test <- train(Species ~ ., data = iris, method ="rf", trControl = fitControl2, ntree = NUMBER_OF_TREES, importance = TRUE, tuneGrid = rfGrid2) print

Caret Model random forest into PMML error

旧巷老猫 提交于 2019-12-02 02:29:33
I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting. data(iris) require(caret) require(pmml) rfGrid2 <- expand.grid(.mtry = c(1,2)) fitControl2 <- trainControl( method = "repeatedcv", number = NUMBER_OF_CV, repeats = REPEATES) model.Test <- train(Species ~ ., data = iris, method ="rf", trControl = fitControl2, ntree = NUMBER_OF_TREES, importance = TRUE, tuneGrid = rfGrid2) print(model.Test) pmml(model.Test) Error in UseMethod("pmml") : no applicable method for 'pmml' applied to an