r-caret

Why is caret train taking up so much memory?

倾然丶 夕夏残阳落幕 提交于 2019-12-29 14:18:07
问题 When I train just using glm , everything works, and I don't even come close to exhausting memory. But when I run train(..., method='glm') , I run out of memory. Is this because train is storing a lot of data for each iteration of the cross-validation (or whatever the trControl procedure is)? I'm looking at trainControl and I can't find how to prevent this...any hints? I only care about the performance summary and maybe the predicted responses. (I know it's not related to storing data from

R Caret Package error imputing data with Pre-Process function

狂风中的少年 提交于 2019-12-25 05:31:46
问题 I have a dataset (training - testing) with missing data and I would like to impute data before the classification. I tried using the caret package and the function preProcess, I want to impute data using the predictor variable for the training set and impute data on the testing set only using the knowledge of the trainingset without using the predictor of the testing set (that I should not know). p = preProcess(x = training, method = "knnImpute", k = 10) pred = predict(object = p, newdata =

Sorting features based on their importance in CARET package

柔情痞子 提交于 2019-12-25 04:20:12
问题 In caret package and help system for related varImp() there is: Partial Least Squares: the variable importance measure here is based on weighted sums of the absolute regression coefficients. The weights are a function of the reduction of the sums of squares across the number of PLS components and are computed separately for each outcome. Therefore, the contribution of the coefficients are weighted proportionally to the reduction in the sums of squares. Below is the output of variable

caret coefficients of cross validated set

末鹿安然 提交于 2019-12-25 01:56:01
问题 Is it possible to get the coefficients of all the cross validation set from R Caret package? set.seed(1) mu <- rep(0, 4) Sigma <- matrix(.7, nrow=4, ncol=4) diag(Sigma) <- 1 rawvars <- mvrnorm(n=1000, mu=mu, Sigma=Sigma) d <- as.ordered( as.numeric(rawvars[,1]>0.5) ) d[1:200] <- 1 df <- data.frame(rawvars, d) ind <- sample(1:nrow(df), 500) train <- df[ind,] test <- df[-ind,] trControl <- trainControl(method = "repeatedcv", repeats = 1, classProb = T, summaryFunction= twoClassSummary) fit

CHAID error using caret in R: model fit failed for Resample01: alpha2=0.05, alpha3=-1, alpha4=0.05 Error : is.factor(x) is not TRUE

≯℡__Kan透↙ 提交于 2019-12-24 21:44:14
问题 CHAID error using caret in R: model fit failed for Resample01: alpha2=0.05, alpha3=-1, alpha4=0.05 Error : is.factor(x) is not TRUE I'm getting the error above when trying to run a CHAID model in caret. The model runs fine with this data just by using the CHAID function. Any suggestions? Code below: model_weights <- ifelse(as.character(train_data$outcome) == "Sucess", 5.4,1) model_tree_caret_cost = caret::train(outcome ~ ., data = train_data, method = "chaid", #tuneGrid = tunegrid, #costs =

Using linear regression (lm) in R caret, how do I force the intercept through 0? [duplicate]

空扰寡人 提交于 2019-12-24 08:57:20
问题 This question already has answers here : Fit a no-intercept model in caret (2 answers) Closed 3 months ago . I'm trying to use R caret to perform cross-validation of my linear regression models. In some cases I want to force the intercept through 0. I have tried the following, using the standard lm syntax: regressControl <- trainControl(method="repeatedcv", number = 4, repeats = 5 ) regress <- train(y ~ 0 + x, data = myData, method = "lm", trControl = regressControl) Call: lm(formula =

Formula vs non-formula interface in train()

不羁岁月 提交于 2019-12-24 08:17:26
问题 [I looked into similar threads here and in github, and none of the issues suggested by Max and others seem to relate to my case.] I have seen some here reporting about formula interface failing whereas non-formula interface working fine for them. My problem is the opposite. The train() function below with formula interface works perfect: glmTune <- train(class ~ ., data = trainData, method = "glmnet", trControl = train.control, tuneGrid = tune.grid) This one below gives NA errors: predictors

All binary predictors in a classification task

天大地大妈咪最大 提交于 2019-12-24 07:24:21
问题 I am performing my analysis using R, I will be implementing four algorithms. 1. RF 2. Log Reg 3. SVM 4. LDA I have 50 predictors and 1 target variable. All my predictors and target variable are only binary numbers 0s and 1s. I have the following questions: Should I convert them all into factors? Converting them into factors, and applying RF algorithms give 100% accuracy, I am very much surprised to see that as well. Also, for other algorithms, how should i treat my variables priorly, before

Caret on R spills “unable to find variable ”optimismBoot“” error message

徘徊边缘 提交于 2019-12-24 03:23:39
问题 I have been testing caret on R to test neural network features. As I run the script below, it was working correctly, this has been starting outputting "unable to find variable "optimismBoot". library(doParallel) cl <- makePSOCKcluster(4) registerDoParallel(cl) library(caret) m<-rbind(c(1,2,3),c(4,5,6),c(7,8,9)) train_data<-as.data.frame(m) nnmodel <- train( V3 ~ ., data = train_data, method = "nnet", preProcess = c('center', 'scale'), trControl = trainControl(method = "cv"), tuneGrid = expand

R caret: leave subject out cross validation with data subset for training?

☆樱花仙子☆ 提交于 2019-12-24 01:44:10
问题 I want to perform leave subject out cross validation with R caret (cf. this example) but only use a subset of the data in training for creating CV models. Still, the left out CV partition should be used as a whole, as I need to test on all data of a left out subject (no matter if it's millions of samples that cannot be used in training due to computational restrictions). I've created a minimal 2 class classification example using the subset and index parameters of caret::train and caret: