r-caret

R caret model evaluation with alternate performance metric

我们两清 提交于 2019-12-05 05:46:05
I'm using R's caret package to do some grid search and model evaluation. I have a custom evaluation metric that is a weighted average of absolute error. Weights are assigned at the observation level. X <- c(1,1,2,0,1) #feature 1 w <- c(1,2,2,1,1) #weights Y <- 1:5 #target, continuous #assume I run a model using X as features and Y as target and get a vector of predictions mymetric <- function(predictions, target, weights){ v <- sum(abs(target-predictions)*weights)/sum(weights) return(v) } Here an example is given on how to use summaryFunction to define a custom evaluation metric for caret's

How to predict on a new dataset using caretEnsemble package in R?

耗尽温柔 提交于 2019-12-05 04:43:35
问题 I am currently using caretEnsemble package in R for combining multiple models trained in caret. I have got the list of final trained models (say model_list ) using caretList function from the same package as follows. model_list <- caretList( x = input_predictors, y = input_labels, metric = 'Accuracy', tuneList = list( randomForestModel = caretModelSpec(method='rf', tuneLength=1, preProcess=c('BoxCox', 'center', 'scale')), ldaModel = caretModelSpec(method='lda', tuneLength=1, preProcess=c(

Parallel execution of train in caret fails with function not found

丶灬走出姿态 提交于 2019-12-05 03:09:30
yesterday I updated my R packages and since then parallel execution of the train function fails. It seems like some functions that are called from within the workers are not available. These functions are at least flatTable and probFunction. I experiencing this issues on my production machine, and was able to reproduce it on a clean Windows 7 x64 VM. I added a minimal working example below. Dear users of stackoverflow: Any help is appreciated! # R 3.0.2 x64, RStudio Version 0.98.490, Windows 7 x64 data(iris) library(caret) # 6.0-21 library(doParallel) # 1.0.6 model <- "rf" # Fail ?probFunction

caret train rf model - inexplicably long execution

不羁的心 提交于 2019-12-05 01:46:19
问题 While trying to train random forest model with caret package, I noticed that execution time is inexplicably long: > set.seed = 1; > n = 500; > m = 30; > x = matrix(rnorm(n * m), nrow = n); > y = factor(sample.int(2, n, replace = T), labels = c("yes", "no")) > require(caret); > require(randomForest); > print(system.time({rf <- randomForest(x, y);})); user system elapsed 0.99 0.00 0.98 > print(system.time({rfmod <- train(x = x, y = y, + method = "rf", + metric = "Accuracy", + trControl =

Parallelizing Caret code

微笑、不失礼 提交于 2019-12-04 22:45:54
I am having hard time to figure out why this code does not parallelize. I am taking the reproducible example straight from the caret web page. library(caret) library(mlbench) library(Hmisc) library(randomForest) library(doMC) registerDoMC(cores = 3) n <- 100 p <- 40 sigma <- 1 set.seed(1) sim <- mlbench.friedman1(n, sd = sigma) colnames(sim$x) <- c(paste("real", 1:5, sep = ""), paste("bogus", 1:5, sep = "")) bogus <- matrix(rnorm(n * p), nrow = n) colnames(bogus) <- paste("bogus", 5+(1:ncol(bogus)), sep = "") x <- cbind(sim$x, bogus) y <- sim$y normalization <- preProcess(x) x <- predict

Warnings while using the Naive Bayes Classifier in the Caret Package

女生的网名这么多〃 提交于 2019-12-04 22:02:16
I am attempting to run a supervised machine learning classifier known as Naive Bayes in the caret Package. My data is called LDA.scores, and has two categorical factors called "V4" and "G8", and 12 predictor variables. The code that I am using was adapted by a kind person on stack overflow from code supplied by myself (see link below).The code does work, however, only 9 predictors were used instead of the 12 predictors in the data-set. When I tried to train the Naive Bayes model with the total data set [2:13], the code failed. My next step was to systematically run the code with a subset of

Create RMSLE metric in caret in r

拟墨画扇 提交于 2019-12-04 21:07:19
Could someone please help me with the following: I need to change my xgboost training model with caret package to an undefault metric RMSLE. By default caret and xgboost train and measure in RMSE. Here are the lines of code: create custom summary function in caret format custom_summary = function(data, lev = NULL, model = NULL){ out = rmsle(data[, "obs"], data[, "pred"]) names(out) = c("rmsle") out } create control object control = trainControl(method = "cv", number = 2, summaryFunction = custom_summary) create grid of tuning parameters grid = expand.grid(nrounds = 100, max_depth = 6, eta = 0

Feature selection with caret rfe and training with another method

烈酒焚心 提交于 2019-12-04 20:19:31
Right now, I'm trying to use Caret rfe function to perform the feature selection, because I'm in a situation with p>>n and most regression techniques that don't involve some sort of regularisation can't be used well. I already used a few techniques with regularisation (Lasso), but what I want to try now is reduce my number of feature so that I'm able to run, at least decently, any kind of regression algorithm on it. control <- rfeControl(functions=rfFuncs, method="cv", number=5) model <- rfe(trainX, trainY, rfeControl=control) predict(model, testX) Right now, if I do it like this, a feature

Caret::train - Values Not Imputed

梦想与她 提交于 2019-12-04 17:44:34
问题 I am trying to impute values by passing "knnImpute" to the preProcess argument of Caret's train() method. Based on the following example, it appears that the values are not imputed, remain as NA and are then ignored. What am I doing wrong? Any help is much appreciated. library("caret") set.seed(1234) data(iris) # mark 8 of the cells as NA, so they can be imputed row <- sample (1:nrow (iris), 8) iris [row, 1] <- NA # split test vs training train.index <- createDataPartition (y = iris[,5], p =

Working with text classification and big sparse matrices in R

不羁的心 提交于 2019-12-04 15:44:51
I'm working on a text multi-class classification project and I need to build the document / term matrices and train and test in R language. I already have datasets that don't fit in the limited dimensionality of the base matrix class in R and would need to build big sparse matrices to be able to classify for example, 100k tweets. I am using the quanteda package, as it has been for now more useful and reliable than the package tm , where creating a DocumentTermMatrix with a dictionary, makes the process incredibly memory hungry with small datasets. Currently, as I said, I use quanteda to build