r-caret

Difference between varImp (caret) and importance (randomForest) for Random Forest

二次信任 提交于 2019-12-20 12:32:00
问题 I do not understand which is the difference between varImp function ( caret package) and importance function ( randomForest package) for a Random Forest model: I computed a simple RF classification model and when computing variable importance, I found that the "ranking" of predictors was not the same for both functions: Here is my code: rfImp <- randomForest(Origin ~ ., data = TAll_CS, ntree = 2000, importance = TRUE) importance(rfImp) BREAST LUNG MeanDecreaseAccuracy MeanDecreaseGini Energy

Custom metric (hmeasure) for summaryFunction caret classification

不羁的心 提交于 2019-12-20 10:57:21
问题 I am trying to use the hmeasure metric Hand,2009 as my custom metric for training SVMs in caret. As I am relatively new to using R, I tried adapt the twoClassSummary function. All I need is to pass the true class labels and predicted class probability from the model (an svm) to the HMeasure function from the hmeasure package instead of using ROC or other measures of classification performance in caret. For example, a call to the HMeasure function in R - HMeasure(true.class,predictedProbs[,2])

R understanding {caret} train(tuneLength = ) and SVM methods from {kernlab}

只谈情不闲聊 提交于 2019-12-20 10:34:49
问题 Trying to better understand how train(tuneLength = ) works in {caret} . My confusion happened when trying to understand some of the differences between the SVM methods from {kernlab} I've reviewed the documentation (here) and the caret training page (here). My toy example was creating five models using the iris dataset. Results are here, and reproducible code is here (they're rather long so I didn't copy and paste them into the post). From the {caret} documentation: tuneLength an integer

caret train method not working (something is wrong for all accuracy results) for outcomes with >2 categories

时光总嘲笑我的痴心妄想 提交于 2019-12-20 03:26:18
问题 Hi I know someone asked similar issues before but no clear answer yet (or I tried their solution without success: Caret error using GBM, but not without caret Caret train method complains Something is wrong; all the RMSE metric values are missing ) I tried to use caret training methods to predict the categorical outcomes (online data examples below) library(mlbench) data(Sonar) str(Sonar[, 1:10]) library(caret) set.seed(998) Sonar$rand<-rnorm(nrow(Sonar)) ##to randomly create the new 3

Caret train method complains Something is wrong; all the RMSE metric values are missing

亡梦爱人 提交于 2019-12-19 17:45:17
问题 On numerous occasions I've been getting this error when trying to fit a gbm or rpart model. Finally I was able to reproduce it consistently using publicly available data. I have noticed that this error happens when using CV (or repeated cv). When I don't use any fit control I don't get this error. Can some shed some light one why I keep getting error consistently. fitControl= trainControl("repeatedcv", repeats=5) ds = read.csv("http://www.math.smith.edu/r/data/help.csv") ds$sub = as.factor(ds

How to preProcess features when some of them are factors?

三世轮回 提交于 2019-12-18 19:05:13
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 6 years ago . My question is related to this one regarding categorical data (factors in R terms) when using the Caret package. I understand from the linked post that if you use the "formula interface", some features can be factors and the training will work fine. My question is how can I scale the data with the preProcess() function? If I try and do it on a data frame with some columns as

Dummy variables and preProcess

时光怂恿深爱的人放手 提交于 2019-12-18 16:31:13
问题 I have a data frame with some dummy variables that I want to use as training set for glmnet . Since I'm using glmnet I want to center and scale the features using the preProcess option in the caret train function. I don't want that this transformation is applied also to the dummy variables. Is there a way to prevent the transformation of these variables? 回答1: There's not (currently) a way to do this besides writing a custom model to do so (see the example with PLS and RF near the end). I'm

Fit a no-intercept model in caret

这一生的挚爱 提交于 2019-12-18 13:05:49
问题 In R, I specify a model with no intercept as follows: data(iris) lmFit <- lm(Sepal.Length ~ 0 + Petal.Length + Petal.Width, data=iris) > round(coef(lmFit),2) Petal.Length Petal.Width 2.86 -4.48 However, if I fit the same model with caret, the resulting model includes an intercept: library(caret) caret_lmFit <- train(Sepal.Length~0+Petal.Length+Petal.Width, data=iris, "lm") > round(coef(caret_lmFit$finalModel),2) (Intercept) Petal.Length Petal.Width 4.19 0.54 -0.32 How do I tell caret::train

ROC curve from training data in caret

痞子三分冷 提交于 2019-12-18 10:03:16
问题 Using the R package caret, how can I generate a ROC curve based on the cross-validation results of the train() function? Say, I do the following: data(Sonar) ctrl <- trainControl(method="cv", summaryFunction=twoClassSummary, classProbs=T) rfFit <- train(Class ~ ., data=Sonar, method="rf", preProc=c("center", "scale"), trControl=ctrl) The training function goes over a range of mtry parameter and calculates the ROC AUC. I would like to see the associated ROC curve -- how do I do that? Note: if

Warning message: “missing values in resampled performance measures” in caret train() using rpart

谁都会走 提交于 2019-12-17 15:56:06
问题 I am using the caret package to train a model with "rpart" package; tr = train(y ~ ., data = trainingDATA, method = "rpart") Data has no missing values or NA's, but when running the command a warning message comes up; Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. Does anyone know (or could point me to where to find an answer) what does this warning mean? I know it is telling me that there