r-caret | 易学教程

Why is caret train taking up so much memory?

阅读更多关于 Why is caret train taking up so much memory?

问题 When I train just using glm , everything works, and I don't even come close to exhausting memory. But when I run train(..., method='glm') , I run out of memory. Is this because train is storing a lot of data for each iteration of the cross-validation (or whatever the trControl procedure is)? I'm looking at trainControl and I can't find how to prevent this...any hints? I only care about the performance summary and maybe the predicted responses. (I know it's not related to storing data from

R Caret Package error imputing data with Pre-Process function

阅读更多关于 R Caret Package error imputing data with Pre-Process function

问题 I have a dataset (training - testing) with missing data and I would like to impute data before the classification. I tried using the caret package and the function preProcess, I want to impute data using the predictor variable for the training set and impute data on the testing set only using the knowledge of the trainingset without using the predictor of the testing set (that I should not know). p = preProcess(x = training, method = "knnImpute", k = 10) pred = predict(object = p, newdata =

Sorting features based on their importance in CARET package

阅读更多关于 Sorting features based on their importance in CARET package

问题 In caret package and help system for related varImp() there is: Partial Least Squares: the variable importance measure here is based on weighted sums of the absolute regression coefficients. The weights are a function of the reduction of the sums of squares across the number of PLS components and are computed separately for each outcome. Therefore, the contribution of the coefficients are weighted proportionally to the reduction in the sums of squares. Below is the output of variable

caret coefficients of cross validated set

阅读更多关于 caret coefficients of cross validated set

问题 Is it possible to get the coefficients of all the cross validation set from R Caret package? set.seed(1) mu <- rep(0, 4) Sigma <- matrix(.7, nrow=4, ncol=4) diag(Sigma) <- 1 rawvars <- mvrnorm(n=1000, mu=mu, Sigma=Sigma) d <- as.ordered( as.numeric(rawvars[,1]>0.5) ) d[1:200] <- 1 df <- data.frame(rawvars, d) ind <- sample(1:nrow(df), 500) train <- df[ind,] test <- df[-ind,] trControl <- trainControl(method = "repeatedcv", repeats = 1, classProb = T, summaryFunction= twoClassSummary) fit

CHAID error using caret in R: model fit failed for Resample01: alpha2=0.05, alpha3=-1, alpha4=0.05 Error : is.factor(x) is not TRUE

阅读更多关于 CHAID error using caret in R: model fit failed for Resample01: alpha2=0.05, alpha3=-1, alpha4=0.05 Error : is.factor(x) is not TRUE

问题 CHAID error using caret in R: model fit failed for Resample01: alpha2=0.05, alpha3=-1, alpha4=0.05 Error : is.factor(x) is not TRUE I'm getting the error above when trying to run a CHAID model in caret. The model runs fine with this data just by using the CHAID function. Any suggestions? Code below: model_weights <- ifelse(as.character(train_data$outcome) == "Sucess", 5.4,1) model_tree_caret_cost = caret::train(outcome ~ ., data = train_data, method = "chaid", #tuneGrid = tunegrid, #costs =

Using linear regression (lm) in R caret, how do I force the intercept through 0? [duplicate]

阅读更多关于 Using linear regression (lm) in R caret, how do I force the intercept through 0? [duplicate]

问题 This question already has answers here : Fit a no-intercept model in caret (2 answers) Closed 3 months ago . I'm trying to use R caret to perform cross-validation of my linear regression models. In some cases I want to force the intercept through 0. I have tried the following, using the standard lm syntax: regressControl <- trainControl(method="repeatedcv", number = 4, repeats = 5 ) regress <- train(y ~ 0 + x, data = myData, method = "lm", trControl = regressControl) Call: lm(formula =

Formula vs non-formula interface in train()

阅读更多关于 Formula vs non-formula interface in train()

问题 [I looked into similar threads here and in github, and none of the issues suggested by Max and others seem to relate to my case.] I have seen some here reporting about formula interface failing whereas non-formula interface working fine for them. My problem is the opposite. The train() function below with formula interface works perfect: glmTune <- train(class ~ ., data = trainData, method = "glmnet", trControl = train.control, tuneGrid = tune.grid) This one below gives NA errors: predictors

All binary predictors in a classification task

阅读更多关于 All binary predictors in a classification task

问题 I am performing my analysis using R, I will be implementing four algorithms. 1. RF 2. Log Reg 3. SVM 4. LDA I have 50 predictors and 1 target variable. All my predictors and target variable are only binary numbers 0s and 1s. I have the following questions: Should I convert them all into factors? Converting them into factors, and applying RF algorithms give 100% accuracy, I am very much surprised to see that as well. Also, for other algorithms, how should i treat my variables priorly, before

Caret on R spills “unable to find variable ”optimismBoot“” error message

阅读更多关于 Caret on R spills “unable to find variable ”optimismBoot“” error message

问题 I have been testing caret on R to test neural network features. As I run the script below, it was working correctly, this has been starting outputting "unable to find variable "optimismBoot". library(doParallel) cl <- makePSOCKcluster(4) registerDoParallel(cl) library(caret) m<-rbind(c(1,2,3),c(4,5,6),c(7,8,9)) train_data<-as.data.frame(m) nnmodel <- train( V3 ~ ., data = train_data, method = "nnet", preProcess = c('center', 'scale'), trControl = trainControl(method = "cv"), tuneGrid = expand

R caret: leave subject out cross validation with data subset for training?

阅读更多关于 R caret: leave subject out cross validation with data subset for training?

问题 I want to perform leave subject out cross validation with R caret (cf. this example) but only use a subset of the data in training for creating CV models. Still, the left out CV partition should be used as a whole, as I need to test on all data of a left out subject (no matter if it's millions of samples that cannot be used in training due to computational restrictions). I've created a minimal 2 class classification example using the subset and index parameters of caret::train and caret: