r-caret | 易学教程

Difference between varImp (caret) and importance (randomForest) for Random Forest

阅读更多关于 Difference between varImp (caret) and importance (randomForest) for Random Forest

问题 I do not understand which is the difference between varImp function ( caret package) and importance function ( randomForest package) for a Random Forest model: I computed a simple RF classification model and when computing variable importance, I found that the "ranking" of predictors was not the same for both functions: Here is my code: rfImp <- randomForest(Origin ~ ., data = TAll_CS, ntree = 2000, importance = TRUE) importance(rfImp) BREAST LUNG MeanDecreaseAccuracy MeanDecreaseGini Energy

Custom metric (hmeasure) for summaryFunction caret classification

阅读更多关于 Custom metric (hmeasure) for summaryFunction caret classification

问题 I am trying to use the hmeasure metric Hand,2009 as my custom metric for training SVMs in caret. As I am relatively new to using R, I tried adapt the twoClassSummary function. All I need is to pass the true class labels and predicted class probability from the model (an svm) to the HMeasure function from the hmeasure package instead of using ROC or other measures of classification performance in caret. For example, a call to the HMeasure function in R - HMeasure(true.class,predictedProbs[,2])

R understanding {caret} train(tuneLength = ) and SVM methods from {kernlab}

阅读更多关于 R understanding {caret} train(tuneLength = ) and SVM methods from {kernlab}

问题 Trying to better understand how train(tuneLength = ) works in {caret} . My confusion happened when trying to understand some of the differences between the SVM methods from {kernlab} I've reviewed the documentation (here) and the caret training page (here). My toy example was creating five models using the iris dataset. Results are here, and reproducible code is here (they're rather long so I didn't copy and paste them into the post). From the {caret} documentation: tuneLength an integer

caret train method not working (something is wrong for all accuracy results) for outcomes with >2 categories

阅读更多关于 caret train method not working (something is wrong for all accuracy results) for outcomes with >2 categories

问题 Hi I know someone asked similar issues before but no clear answer yet (or I tried their solution without success: Caret error using GBM, but not without caret Caret train method complains Something is wrong; all the RMSE metric values are missing ) I tried to use caret training methods to predict the categorical outcomes (online data examples below) library(mlbench) data(Sonar) str(Sonar[, 1:10]) library(caret) set.seed(998) Sonar$rand<-rnorm(nrow(Sonar)) ##to randomly create the new 3

Caret train method complains Something is wrong; all the RMSE metric values are missing

阅读更多关于 Caret train method complains Something is wrong; all the RMSE metric values are missing

问题 On numerous occasions I've been getting this error when trying to fit a gbm or rpart model. Finally I was able to reproduce it consistently using publicly available data. I have noticed that this error happens when using CV (or repeated cv). When I don't use any fit control I don't get this error. Can some shed some light one why I keep getting error consistently. fitControl= trainControl("repeatedcv", repeats=5) ds = read.csv("http://www.math.smith.edu/r/data/help.csv") ds$sub = as.factor(ds

How to preProcess features when some of them are factors?

阅读更多关于 How to preProcess features when some of them are factors?

问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 6 years ago . My question is related to this one regarding categorical data (factors in R terms) when using the Caret package. I understand from the linked post that if you use the "formula interface", some features can be factors and the training will work fine. My question is how can I scale the data with the preProcess() function? If I try and do it on a data frame with some columns as

Dummy variables and preProcess

阅读更多关于 Dummy variables and preProcess

问题 I have a data frame with some dummy variables that I want to use as training set for glmnet . Since I'm using glmnet I want to center and scale the features using the preProcess option in the caret train function. I don't want that this transformation is applied also to the dummy variables. Is there a way to prevent the transformation of these variables? 回答1: There's not (currently) a way to do this besides writing a custom model to do so (see the example with PLS and RF near the end). I'm

Fit a no-intercept model in caret

阅读更多关于 Fit a no-intercept model in caret

问题 In R, I specify a model with no intercept as follows: data(iris) lmFit <- lm(Sepal.Length ~ 0 + Petal.Length + Petal.Width, data=iris) > round(coef(lmFit),2) Petal.Length Petal.Width 2.86 -4.48 However, if I fit the same model with caret, the resulting model includes an intercept: library(caret) caret_lmFit <- train(Sepal.Length~0+Petal.Length+Petal.Width, data=iris, "lm") > round(coef(caret_lmFit$finalModel),2) (Intercept) Petal.Length Petal.Width 4.19 0.54 -0.32 How do I tell caret::train

ROC curve from training data in caret

阅读更多关于 ROC curve from training data in caret

问题 Using the R package caret, how can I generate a ROC curve based on the cross-validation results of the train() function? Say, I do the following: data(Sonar) ctrl <- trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T) rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), trControl=ctrl) The training function goes over a range of mtry parameter and calculates the ROC AUC. I would like to see the associated ROC curve -- how do I do that? Note: if

Warning message: “missing values in resampled performance measures” in caret train() using rpart

阅读更多关于 Warning message: “missing values in resampled performance measures” in caret train() using rpart

问题 I am using the caret package to train a model with "rpart" package; tr = train(y ~ ., data = trainingDATA, method = "rpart") Data has no missing values or NA's, but when running the command a warning message comes up; Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. Does anyone know (or could point me to where to find an answer) what does this warning mean? I know it is telling me that there