r-caret

Time-series - data splitting and model evaluation

一曲冷凌霜 提交于 2020-01-09 12:26:10
问题 I've tried to use machine learning to make prediction based on time-series data. In one of the stackoverflow question (createTimeSlices function in CARET package in R) is an example of using createTimeSlices to cross-validation for model training and parameter tuning: library(caret) library(ggplot2) library(pls) data(economics) myTimeControl <- trainControl(method = "timeslice", initialWindow = 36, horizon = 12, fixedWindow = TRUE) plsFitTime <- train(unemploy ~ pce + pop + psavert, data =

Error: Please use column names for `x` when using caret() for logistic regression

折月煮酒 提交于 2020-01-06 06:53:24
问题 I'd like to build a logistic regression model using the caret package. This is my code. library(caret) df <- data.frame(response = sample(0:1, 200, replace=TRUE), predictor = rnorm(200,10,45)) outcomeName <-"response" predictors <- names(df)[!(names(df) %in% outcomeName)] index <- createDataPartition(df$response, p=0.75, list=FALSE) trainSet <- df[ index,] testSet <- df[-index,] model_glm <- train(trainSet[,outcomeName], trainSet[,predictors], method='glm', family="binomial", data = trainSet)

Cross-validation predictions from caret in assigned to different folds

你说的曾经没有我的故事 提交于 2020-01-06 06:45:10
问题 I am wondering why predictions from 'Fold1' are actually predictions from the second fold in my predefined folds. I attach an example of what I mean. # load the library library(caret) # load the cars dataset data(cars) # define folds cv_folds <- createFolds(cars$Price, k = 5, list = TRUE, returnTrain = TRUE) # define training control train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final') # fix the parameters of the algorithm # train the model model <- caret:

Why does caret::predict() use parallel processing with XGBtree only?

戏子无情 提交于 2020-01-06 05:26:11
问题 I understand why parallel processing can be used during training only for XGB and cannot be used for other models. However, surprisingly I noticed that predict with xgb uses parallel processing too. I noticed this by accident when I split my large 10M + data frame into pieces to predict on using foreach %dopar% . This caused some errors so to try to get around them I switched to sequential looping with %do% but noticed in the terminal that all processors where being used. After some trial and

Ensemble different datasets in R

女生的网名这么多〃 提交于 2020-01-06 04:45:10
问题 I am trying to combine signals from different models using the example described here . I have different datasets which predicts the same output. However, when I combine the model output in caretList , and ensemble the signals, it gives an error Error in check_bestpreds_resamples(modelLibrary) : Component models do not have the same re-sampling strategies Here is the reproducible example library(caret) library(caretEnsemble) df1 <- data.frame(x1 = rnorm(200), x2 = rnorm(200), y = as.factor

Progress Bar for Model Training in Shiny R

陌路散爱 提交于 2020-01-05 08:45:13
问题 I am making a Shiny App in which, at the click of the actionButton, a model is trained using the caret package. As this training takes time - approximately 4-5 minutes - I wanted to display a progress bar which progresses as the model is trained. Thanks 回答1: To display progress bar in shiny app, you need to use withProgress function in server as below: withProgress(message = "Model is Training", value = 1.0, { ## Your code }) So, you put your code inside this function and it will display the

R: Something is wrong; all the Accuracy metric values are missing

徘徊边缘 提交于 2020-01-05 05:39:05
问题 When running rpart , I am getting an error message saying : Something is wrong; all the Accuracy metric values are missing: The dataset can be found here and has no NAs, can someone help? > rf.5.cv.1 # Random Forest # 891 samples # 6 predictor # 2 classes: '0', '1' # No pre-processing # Resampling: Cross-Validated (10 fold, repeated 10 times) # Summary of sample sizes: 802, 802, 803, 801, 801, 802, ... # Resampling results across tuning parameters: # mtry Accuracy Kappa # 2 0.8383655 0

R Caret Random Forest AUC too good to be true?

时光毁灭记忆、已成空白 提交于 2020-01-05 02:27:09
问题 Relative newbie to predictive modeling--most of my training/experience is in inferential stats. I'm trying to predict student college graduation in 4 years. Basic issue is that I've done data cleaning (imputing, centering, scaling); split that processed/transformed data into training (70%) and testing (30%) sets; balanced the data using two approaches (because data was 65%=0, 35%=1--and I've found inconsistent advice on what classifies as unbalanced, but one source suggested anything not

Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation

非 Y 不嫁゛ 提交于 2020-01-04 14:15:07
问题 I want to create jack-knife data partitions for the data frame below, with the partitions to be used in caret::train (like the caret::groupKFold() produces). However, the catch is that I want to restrict the test points to say greater than 16 days, whilst using the remainder of these data as the training set. df <- data.frame(Effect = seq(from = 0.05, to = 1, by = 0.05), Time = seq(1:20)) The reason I want to do this is that I am only really interested in how well the model is predicting the

stepwise regression using caret in R [closed]

你。 提交于 2020-01-03 06:43:29
问题 Closed . This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago . I have used leaps package in R to perform forward and backward feature elimination. However, I want automate the cross validation and prediction operations. Therefore, how can I use forward/backward selection in caret? in leaps package you could do it this way forward <-