r-caret | 易学教程

Different results with randomForest() and caret's randomForest (method = “rf”)

阅读更多关于 Different results with randomForest() and caret's randomForest (method = “rf”)

I am new to caret, and I just want to ensure that I fully understand what it’s doing. Towards that end, I’ve been attempting to replicate the results I get from a randomForest() model using caret’s train() function for method="rf". Unfortunately, I haven’t been able to get matching results, and I’m wondering what I’m overlooking. I’ll also add that given that randomForest uses bootstrapping to generate samples to fit each of the ntrees, and estimates error based on out-of-bag predictions, I’m a little fuzzy on the difference between specifying "oob" and "boot" in the trainControl function call

Caret Package: Stratified Cross Validation in Train Function

阅读更多关于 Caret Package: Stratified Cross Validation in Train Function

Is there a way to perform stratified cross validation when using the train function to fit a model to a large imbalanced data set? I know straight forward k fold cross validation is possible but my categories are highly unbalanced. I've seen discussion about this topic but no real definitive answer. Thanks in advance. There is a parameter called 'index' which can let user specified the index to do cross validation. folds <- 4 cvIndex <- createFolds(factor(training$Y), folds, returnTrain = T) tc <- trainControl(index = cvIndex, method = 'cv', number = folds) rfFit <- train(Y ~ ., data =

Difference between varImp (caret) and importance (randomForest) for Random Forest

阅读更多关于 Difference between varImp (caret) and importance (randomForest) for Random Forest

I do not understand which is the difference between varImp function ( caret package) and importance function ( randomForest package) for a Random Forest model: I computed a simple RF classification model and when computing variable importance, I found that the "ranking" of predictors was not the same for both functions: Here is my code: rfImp <- randomForest(Origin ~ ., data = TAll_CS, ntree = 2000, importance = TRUE) importance(rfImp) BREAST LUNG MeanDecreaseAccuracy MeanDecreaseGini Energy_GLCM_R1SC4NG3 -1.44116806 2.8918537 1.0929302 0.3712622 Contrast_GLCM_R1SC4NG3 -2.61146974 1.5848150 -0

Custom metric (hmeasure) for summaryFunction caret classification

阅读更多关于 Custom metric (hmeasure) for summaryFunction caret classification

I am trying to use the hmeasure metric Hand,2009 as my custom metric for training SVMs in caret. As I am relatively new to using R, I tried adapt the twoClassSummary function. All I need is to pass the true class labels and predicted class probability from the model (an svm) to the HMeasure function from the hmeasure package instead of using ROC or other measures of classification performance in caret. For example, a call to the HMeasure function in R - HMeasure(true.class,predictedProbs[,2])- results in calculation of the Hmeasure. Using an adaptation of twoClassSummary code below results in

R - convert from categorical to numeric for KNN

阅读更多关于 R - convert from categorical to numeric for KNN

问题 I'm trying to use the Caret package of R to use the KNN applied to the "abalone" database from UCI Machine Learning (link to the data). But it doesn't allow to use KNN when there's categorical values. How do I convert the categorical values (in this database: "M","F","I" ) to numeric values, such as 1,2,3 , respectively? 回答1: When data are read in via read.table , the data in the first column are factors. Then data$iGender = as.integer(data$Gender) would work. If they are character, a detour

How to specify minbucket in caret train for?

阅读更多关于 How to specify minbucket in caret train for?

问题 For CART model, caret seems to only provide tunning of complexity parameter. Is there way to tune other parameters such as minbucket? 回答1: Arguments passed to the classification or regression routines are included in the dots parameter. As you want to include minbucket , parameter control should be included inside train . As example: library("caret") train(Kyphosis ~ Age + Number + Start, data = kyphosis, method = "rpart", tuneGrid = data.frame(cp = c(0.01, 0.05)), control = rpart.control

caret train() predicts very different then predict.glm()

阅读更多关于 caret train() predicts very different then predict.glm()

I am trying to estimate a logistic regression, using the 10-fold cross-validation. #import libraries library(car); library(caret); library(e1071); library(verification) #data import and preparation data(Chile) chile <- na.omit(Chile) #remove "na's" chile <- chile[chile$vote == "Y" | chile$vote == "N" , ] #only "Y" and "N" required chile$vote <- factor(chile$vote) #required to remove unwanted levels chile$income <- factor(chile$income) # treat income as a factor Goal is to estimate a glm - model that predicts to outcome of vote "Y" or "N" depended on relevant explanatory variables and, based on

R's caret training errors when y is not a factor

阅读更多关于 R's caret training errors when y is not a factor

问题 I am using R-studio and am using kaggle's forest cover data and keep getting an error when trying to use the knn3 function in caret. here is my code: library(caret) train <- read.csv("C:/data/forest_cover/train.csv", header=T) trainingRows <- createDataPartition(train$Cover_Type, p=0.8, list=F) head(trainingRows) train_train <- train[trainingRows,] train_test <- train[-trainingRows,] knnfit <- knn3(train_train[,-56], train_train$Cover_Type) This last line gives me this in the console: Error

Caret Model random forest into PMML error

阅读更多关于 Caret Model random forest into PMML error

问题 I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting. data(iris) require(caret) require(pmml) rfGrid2 <- expand.grid(.mtry = c(1,2)) fitControl2 <- trainControl( method = "repeatedcv", number = NUMBER_OF_CV, repeats = REPEATES) model.Test <- train(Species ~ ., data = iris, method ="rf", trControl = fitControl2, ntree = NUMBER_OF_TREES, importance = TRUE, tuneGrid = rfGrid2) print

Caret Model random forest into PMML error

阅读更多关于 Caret Model random forest into PMML error

I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting. data(iris) require(caret) require(pmml) rfGrid2 <- expand.grid(.mtry = c(1,2)) fitControl2 <- trainControl( method = "repeatedcv", number = NUMBER_OF_CV, repeats = REPEATES) model.Test <- train(Species ~ ., data = iris, method ="rf", trControl = fitControl2, ntree = NUMBER_OF_TREES, importance = TRUE, tuneGrid = rfGrid2) print(model.Test) pmml(model.Test) Error in UseMethod("pmml") : no applicable method for 'pmml' applied to an