r-caret | 易学教程

Caret and dummy variables

阅读更多关于 Caret and dummy variables

问题 When calling the train function of the caret package, the data is automatically transformed so that all factor variables are turned into a set of dummy variables. How can I prevent this behaviour? Is it possible to say to caret "don't transform factors into dummy variables"? For example: If I run the rpart algorithm on the etitanic data: library(caret) library(earth) data(etitanic) etitanic$survived[etitanic$survived==1] <- 'YES' etitanic$survived[etitanic$survived!='YES'] <- 'NO' model<

r- caret package error- createDataParition no observation

阅读更多关于 r- caret package error- createDataParition no observation

问题 I'm getting the following error when I try to run createDataPartition in caret. Error in createDataPartition(data1, p = 0.8, list = FALSE) : y must have at least 2 data points I ran the same exact same code last night with no errors. Any thoughts? predictors<- with(df, data.frame(xvar, xvar, xvar, xvar)) data1<-with(dfu2, data.frame(data1)) library(caret) set.seed(1) trainingRows<- createDataPartition(data1, p=.80, list=FALSE) > dput(head(data1, 15)) structure(list(data1 = c(1L, 1L, 1L, 1L,

support vector machine train caret error kernlab class probability calculations failed; returning NAs

阅读更多关于 support vector machine train caret error kernlab class probability calculations failed; returning NAs

问题 i have some data and Y variable is a factor - Good or Bad. I am building a Support vector machine using 'train' method from 'caret' package. Using 'train' function i was able to finalize values of various tuning parameters and got the final Support vector machine . For the test data i can predict the 'class'. But when i try to predict probabilities for test data, i get below error (for example my model tells me that 1st data point in test data has y='good', but i want to know what is the

Installing the caret package in R

阅读更多关于 Installing the caret package in R

问题 How long does it generally take to install caret package in R? I ran install.packages('caret', dependencies = TRUE), and R has been running the install for close to an hour now. Is this normal? 回答1: It shouldn't take that long, I actually had to install the Caret package on one of my machines earlier today and it took less than a minute. Sounds like you might want to check your connection speed. 回答2: With dependencies, it can take a while, mine took about 30mins and my download speed was

“Something is wrong; all the Accuracy metric values are missing” Error in Caret Training

阅读更多关于 “Something is wrong; all the Accuracy metric values are missing” Error in Caret Training

问题 I'm having the same issue as here, but the solutions are not working for me. I'm not sure what I'm doing wrong... Here is my code: # ensure results are repeatable set.seed(7) # load the library library(caret) # load the dataset dataset <- read.csv("C:\\Users\\asholmes\\Documents\\diabetes_r.csv", na.strings='.') #convert to data frame as.data.frame(dataset, stringsAsFactors=TRUE) #create x and y x <- dataset[, 1:15] y <- dataset[, 16] # prepare training scheme control <- trainControl(method=

“Something is wrong; all the Accuracy metric values are missing:”

阅读更多关于 “Something is wrong; all the Accuracy metric values are missing:”

问题 I took the following code out of a textbook, "Machine Learning With R" by Brett Lantz, however copied exactly the same to the console from the textbook, > library(caret) Loading required package: lattice Loading required package: ggplot2 > library(kernlab) Attaching package: ‘kernlab’ The following object is masked from ‘package:ggplot2’: alpha > set.seed(300) > ctrl <- trainControl(method = "cv", number = 10) > bagctrl <- bagControl(fit = svmBag$fit, predict = svmBag$pred, aggregate = svmBag

How to random search in a specified grid in caret package?

阅读更多关于 How to random search in a specified grid in caret package?

问题 I wonder it is possible to use random search in a predefined grid. For example, my grid has alpha and lambda for glmnet method. alpha is between 0 and 1, and lambda is between -10 to 10. I want to use random search 5 times to randomly try points in this bound. I wrote the following code for grid search and it works fine, but I cannot modify it for random search in a bound. rand_ctrl <- trainControl(method = "repeatedcv", repeats = 5, search = "random") grid <- expand.grid(alpha=seq(0,1,0.1)

huge size in model output from train function in r caret package

阅读更多关于 huge size in model output from train function in r caret package

问题 I am training a bagFDA model using train() function in r caret package, and save the model output as a .Rdata file. the input file is about 300k records with 26 variables, but the output .Rdata has a size of 3G. I simply run the following: modelout <- train(x,y,method="bagFDA") save(file= "myout.Rdata", modelout) under a window system. question: (1) why myout.Rdata is so big? (2) how can I reduce the size of the file? Thanks in advance! JT 回答1: In the trainControl set returnData = FALSE for

Error with caret package - classification v regression

阅读更多关于 Error with caret package - classification v regression

问题 I am an actuarial student preparing for an upcoming predictive analytics exam in December. Part of an exercise is to build a model using boosting with caret and xgbTree. See the code below, the caravan dataset is from the ISLR package: library(caret) library(ggplot2) set.seed(1000) data.Caravan <- read.csv(file = "Caravan.csv") data.Caravan$Purchase <- factor(data.Caravan$Purchase) levels(data.Caravan$Purchase) <- c("No", "Yes") data.Caravan.train <- data.Caravan[1:1000, ] data.Caravan.test <

Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the caret::train function

阅读更多关于 Specifiying a selected range of data to be used in leave-one-out (jack-knife) cross-validation for use in the caret::train function

问题 This question builds on the question that I asked here: Creating data partitions over a selected range of data to be fed into caret::train function for cross-validation). The data I am working with looks like this: df <- data.frame(Effect = rep(seq(from = 0.05, to = 1, by = 0.05), each = 5), Time = rep(c(1:20,1:20), each = 5), Replicate = c(1:5)) Essentially what I would like to do is create custom partitions, like those generated by the caret::groupKFold function but for these folds to be