r-caret

Caret and dummy variables

主宰稳场 提交于 2019-12-12 21:24:47
问题 When calling the train function of the caret package, the data is automatically transformed so that all factor variables are turned into a set of dummy variables. How can I prevent this behaviour? Is it possible to say to caret "don't transform factors into dummy variables"? For example: If I run the rpart algorithm on the etitanic data: library(caret) library(earth) data(etitanic) etitanic$survived[etitanic$survived==1] <- 'YES' etitanic$survived[etitanic$survived!='YES'] <- 'NO' model<

r- caret package error- createDataParition no observation

孤街醉人 提交于 2019-12-12 17:15:27
问题 I'm getting the following error when I try to run createDataPartition in caret. Error in createDataPartition(data1, p = 0.8, list = FALSE) : y must have at least 2 data points I ran the same exact same code last night with no errors. Any thoughts? predictors<- with(df, data.frame(xvar, xvar, xvar, xvar)) data1<-with(dfu2, data.frame(data1)) library(caret) set.seed(1) trainingRows<- createDataPartition(data1, p=.80, list=FALSE) > dput(head(data1, 15)) structure(list(data1 = c(1L, 1L, 1L, 1L,

support vector machine train caret error kernlab class probability calculations failed; returning NAs

那年仲夏 提交于 2019-12-12 08:54:13
问题 i have some data and Y variable is a factor - Good or Bad. I am building a Support vector machine using 'train' method from 'caret' package. Using 'train' function i was able to finalize values of various tuning parameters and got the final Support vector machine . For the test data i can predict the 'class'. But when i try to predict probabilities for test data, i get below error (for example my model tells me that 1st data point in test data has y='good', but i want to know what is the

Installing the caret package in R

心不动则不痛 提交于 2019-12-12 05:35:20
问题 How long does it generally take to install caret package in R? I ran install.packages('caret', dependencies = TRUE), and R has been running the install for close to an hour now. Is this normal? 回答1: It shouldn't take that long, I actually had to install the Caret package on one of my machines earlier today and it took less than a minute. Sounds like you might want to check your connection speed. 回答2: With dependencies, it can take a while, mine took about 30mins and my download speed was

“Something is wrong; all the Accuracy metric values are missing” Error in Caret Training

大憨熊 提交于 2019-12-12 04:08:05
问题 I'm having the same issue as here, but the solutions are not working for me. I'm not sure what I'm doing wrong... Here is my code: # ensure results are repeatable set.seed(7) # load the library library(caret) # load the dataset dataset <- read.csv("C:\\Users\\asholmes\\Documents\\diabetes_r.csv", na.strings='.') #convert to data frame as.data.frame(dataset, stringsAsFactors=TRUE) #create x and y x <- dataset[, 1:15] y <- dataset[, 16] # prepare training scheme control <- trainControl(method=

“Something is wrong; all the Accuracy metric values are missing:”

China☆狼群 提交于 2019-12-12 03:51:41
问题 I took the following code out of a textbook, "Machine Learning With R" by Brett Lantz, however copied exactly the same to the console from the textbook, > library(caret) Loading required package: lattice Loading required package: ggplot2 > library(kernlab) Attaching package: ‘kernlab’ The following object is masked from ‘package:ggplot2’: alpha > set.seed(300) > ctrl <- trainControl(method = "cv", number = 10) > bagctrl <- bagControl(fit = svmBag$fit, predict = svmBag$pred, aggregate = svmBag

How to random search in a specified grid in caret package?

空扰寡人 提交于 2019-12-12 01:26:20
问题 I wonder it is possible to use random search in a predefined grid. For example, my grid has alpha and lambda for glmnet method. alpha is between 0 and 1, and lambda is between -10 to 10. I want to use random search 5 times to randomly try points in this bound. I wrote the following code for grid search and it works fine, but I cannot modify it for random search in a bound. rand_ctrl <- trainControl(method = "repeatedcv", repeats = 5, search = "random") grid <- expand.grid(alpha=seq(0,1,0.1)

huge size in model output from train function in r caret package

不打扰是莪最后的温柔 提交于 2019-12-12 01:26:12
问题 I am training a bagFDA model using train() function in r caret package, and save the model output as a .Rdata file. the input file is about 300k records with 26 variables, but the output .Rdata has a size of 3G. I simply run the following: modelout <- train(x,y,method="bagFDA") save(file= "myout.Rdata", modelout) under a window system. question: (1) why myout.Rdata is so big? (2) how can I reduce the size of the file? Thanks in advance! JT 回答1: In the trainControl set returnData = FALSE for

Error with caret package - classification v regression

馋奶兔 提交于 2019-12-11 17:18:08
问题 I am an actuarial student preparing for an upcoming predictive analytics exam in December. Part of an exercise is to build a model using boosting with caret and xgbTree. See the code below, the caravan dataset is from the ISLR package: library(caret) library(ggplot2) set.seed(1000) data.Caravan <- read.csv(file = "Caravan.csv") data.Caravan$Purchase <- factor(data.Caravan$Purchase) levels(data.Caravan$Purchase) <- c("No", "Yes") data.Caravan.train <- data.Caravan[1:1000, ] data.Caravan.test <

Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the caret::train function

时光毁灭记忆、已成空白 提交于 2019-12-11 15:33:10
问题 This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation). The data I am working with looks like this: df <- data.frame(Effect = rep(seq(from = 0.05, to = 1, by = 0.05), each = 5), Time = rep(c(1:20,1:20), each = 5), Replicate = c(1:5)) Essentially what I would like to do is create custom partitions, like those generated by the caret::groupKFold function but for these folds to be