r-caret

How to pass a character vector in the train function caret R

こ雲淡風輕ζ 提交于 2019-12-01 12:16:17
问题 I want to reduce the number of variables when i train my model. I have a total of 784 features that I want to reduce to lets say 500. I can make a long string with the selected featuees with the Paste command collapsed with + to have a long string. For example, lets say this is my vector val <- "pixel40+pixel46+pixel48+pixel65+pixel66+pixel67" then I would like to pass it to the train function like so Rf_model <- train(label~val, data =training, method="rf", ntree=200, na.action=na.omit) but

Extract the coefficients for the best tuning parameters of a glmnet model in caret

假装没事ソ 提交于 2019-12-01 06:01:11
I am running elastic net regularization in caret using glmnet . I pass sequence of values to trainControl for alpha and lambda, then I perform repeatedcv to get the optimal tunings of alpha and lambda. Here is an example where the optimal tunings for alpha and lambda are 0.7 and 0.5 respectively: age <- c(4, 8, 7, 12, 6, 9, 10, 14, 7, 6, 8, 11, 11, 6, 2, 10, 14, 7, 12, 6, 9, 10, 14, 7) gender <- make.names(as.factor(c(1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1))) bmi_p <- c(0.86, 0.45, 0.99, 0.84, 0.85, 0.67, 0.91, 0.29, 0.88, 0.83, 0.48, 0.99, 0.80, 0.85, 0.50, 0

Creating a data partition using caret and data.table

倖福魔咒の 提交于 2019-12-01 05:57:25
I have a data.table in R which I want to use with caret package set.seed(42) trainingRows<-createDataPartition(DT$variable, p=0.75, list=FALSE) head(trainingRows) # view the samples of row numbers However, I am not able to select the rows with data.table. Instead I had to convert to a data.frame DT_df <-as.data.frame(DT) DT_train<-DT_df[trainingRows,] dim(DT_train) the data.table alternative DT_train <- DT[.(trainingRows),] requires the keys to be set. Any better option other than converting to data.frame? Bruce Pucci Roll you own inTrain <- sample(MyDT[, .I], floor(MyDT[, .N] * .75)) Train <-

Error - Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs)= etc

江枫思渺然 提交于 2019-12-01 04:24:27
问题 Getting an error when using glmnet in Caret Example below Load Libraries library(dplyr) library(caret) library(C50) Load churn data set from library C50 data(churn) create x and y variables churn_x <- subset(churnTest, select= -churn) churn_y <- churnTest[[20]] Use createFolds() to create 5 CV folds on churn_y, the target variable myFolds <- createFolds(churn_y, k = 5) Create trainControl object: myControl myControl <- trainControl( summaryFunction = twoClassSummary, classProbs = TRUE, #

Using neuralnet with caret train and adjusting the parameters

ぐ巨炮叔叔 提交于 2019-12-01 00:31:23
So I've read a paper that had used neural networks to model out a dataset which is similar to a dataset I'm currently using. I have 160 descriptor variables that I want to model out for 160 cases (regression modelling). The paper I read used the following parameters:- 'For each split, a model was developed for each of the 10 individual train-test folds. A three layer back-propagation net with 33 input neurons and 16 hidden neurons was used with online weight updates, 0.25 learning rate, and 0.9 momentum. For each fold, learning was conducted from a total of 50 different random initial weight

R caret / rfe variable selection for factors() AND NAs

北战南征 提交于 2019-12-01 00:16:37
I have a data set with NAs sprinkled generously throughout. In addition it has columns that need to be factors() . I am using the rfe() function from the caret package to select variables. It seems the functions= argument in rfe() using lmFuncs works for the data with NAs but NOT on factor variables, while the rfFuncs works for factor variables but NOT NAs. Any suggestions for dealing with this? I tried model.matrix() but it seems to just cause more problems. Because of inconsistent behavior on these points between packages, not to mention the extra trickiness when going to more "meta"

Dummy variables and preProcess

谁说我不能喝 提交于 2019-11-30 20:43:40
I have a data frame with some dummy variables that I want to use as training set for glmnet . Since I'm using glmnet I want to center and scale the features using the preProcess option in the caret train function. I don't want that this transformation is applied also to the dummy variables. Is there a way to prevent the transformation of these variables? There's not (currently) a way to do this besides writing a custom model to do so (see the example with PLS and RF near the end). I'm working on a method to specify which variables get which pre-processing method. However, with dummy variables,

Using neuralnet with caret train and adjusting the parameters

爷,独闯天下 提交于 2019-11-30 19:09:06
问题 So I've read a paper that had used neural networks to model out a dataset which is similar to a dataset I'm currently using. I have 160 descriptor variables that I want to model out for 160 cases (regression modelling). The paper I read used the following parameters:- 'For each split, a model was developed for each of the 10 individual train-test folds. A three layer back-propagation net with 33 input neurons and 16 hidden neurons was used with online weight updates, 0.25 learning rate, and 0

How to preProcess features when some of them are factors?

陌路散爱 提交于 2019-11-30 17:46:38
My question is related to this one regarding categorical data (factors in R terms) when using the Caret package. I understand from the linked post that if you use the "formula interface", some features can be factors and the training will work fine. My question is how can I scale the data with the preProcess() function? If I try and do it on a data frame with some columns as factors, I get this error message: Error in preProcess.default(etitanic, method = c("center", "scale")) : all columns of x must be numeric See here some sample code: library(earth) data(etitanic) a <- preProcess(etitanic,

How to specify a validation holdout set to caret

回眸只為那壹抹淺笑 提交于 2019-11-30 16:01:10
I really like using caret for at least the early stages of modeling, especially for it's really easy to use resampling methods. However, I'm working on a model where the training set has a fair number of cases added via semi-supervised self-training and my cross-validation results are really skewed because of it. My solution to this is using a validation set to measure model performance but I can't see a way use a validation set directly within caret - am I missing something or this just not supported? I know that I can write my own wrappers to do what caret would normally do for m, but it