r-caret | 易学教程

R caret model evaluation with alternate performance metric

阅读更多关于 R caret model evaluation with alternate performance metric

I'm using R's caret package to do some grid search and model evaluation. I have a custom evaluation metric that is a weighted average of absolute error. Weights are assigned at the observation level. X <- c(1,1,2,0,1) #feature 1 w <- c(1,2,2,1,1) #weights Y <- 1:5 #target, continuous #assume I run a model using X as features and Y as target and get a vector of predictions mymetric <- function(predictions, target, weights){ v <- sum(abs(target-predictions)*weights)/sum(weights) return(v) } Here an example is given on how to use summaryFunction to define a custom evaluation metric for caret's

How to predict on a new dataset using caretEnsemble package in R?

阅读更多关于 How to predict on a new dataset using caretEnsemble package in R?

问题 I am currently using caretEnsemble package in R for combining multiple models trained in caret. I have got the list of final trained models (say model_list ) using caretList function from the same package as follows. model_list <- caretList( x = input_predictors, y = input_labels, metric = 'Accuracy', tuneList = list( randomForestModel = caretModelSpec(method='rf', tuneLength=1, preProcess=c('BoxCox', 'center', 'scale')), ldaModel = caretModelSpec(method='lda', tuneLength=1, preProcess=c(

Parallel execution of train in caret fails with function not found

阅读更多关于 Parallel execution of train in caret fails with function not found

yesterday I updated my R packages and since then parallel execution of the train function fails. It seems like some functions that are called from within the workers are not available. These functions are at least flatTable and probFunction. I experiencing this issues on my production machine, and was able to reproduce it on a clean Windows 7 x64 VM. I added a minimal working example below. Dear users of stackoverflow: Any help is appreciated! # R 3.0.2 x64, RStudio Version 0.98.490, Windows 7 x64 data(iris) library(caret) # 6.0-21 library(doParallel) # 1.0.6 model <- "rf" # Fail ?probFunction

caret train rf model - inexplicably long execution

阅读更多关于 caret train rf model - inexplicably long execution

问题 While trying to train random forest model with caret package, I noticed that execution time is inexplicably long: > set.seed = 1; > n = 500; > m = 30; > x = matrix(rnorm(n * m), nrow = n); > y = factor(sample.int(2, n, replace = T), labels = c("yes", "no")) > require(caret); > require(randomForest); > print(system.time({rf <- randomForest(x, y);})); user system elapsed 0.99 0.00 0.98 > print(system.time({rfmod <- train(x = x, y = y, + method = "rf", + metric = "Accuracy", + trControl =

Parallelizing Caret code

阅读更多关于 Parallelizing Caret code

I am having hard time to figure out why this code does not parallelize. I am taking the reproducible example straight from the caret web page. library(caret) library(mlbench) library(Hmisc) library(randomForest) library(doMC) registerDoMC(cores = 3) n <- 100 p <- 40 sigma <- 1 set.seed(1) sim <- mlbench.friedman1(n, sd = sigma) colnames(sim$x) <- c(paste("real", 1:5, sep = ""), paste("bogus", 1:5, sep = "")) bogus <- matrix(rnorm(n * p), nrow = n) colnames(bogus) <- paste("bogus", 5+(1:ncol(bogus)), sep = "") x <- cbind(sim$x, bogus) y <- sim$y normalization <- preProcess(x) x <- predict

Warnings while using the Naive Bayes Classifier in the Caret Package

阅读更多关于 Warnings while using the Naive Bayes Classifier in the Caret Package

I am attempting to run a supervised machine learning classifier known as Naive Bayes in the caret Package. My data is called LDA.scores, and has two categorical factors called "V4" and "G8", and 12 predictor variables. The code that I am using was adapted by a kind person on stack overflow from code supplied by myself (see link below).The code does work, however, only 9 predictors were used instead of the 12 predictors in the data-set. When I tried to train the Naive Bayes model with the total data set [2:13], the code failed. My next step was to systematically run the code with a subset of

Create RMSLE metric in caret in r

阅读更多关于 Create RMSLE metric in caret in r

Could someone please help me with the following: I need to change my xgboost training model with caret package to an undefault metric RMSLE. By default caret and xgboost train and measure in RMSE. Here are the lines of code: create custom summary function in caret format custom_summary = function(data, lev = NULL, model = NULL){ out = rmsle(data[, "obs"], data[, "pred"]) names(out) = c("rmsle") out } create control object control = trainControl(method = "cv", number = 2, summaryFunction = custom_summary) create grid of tuning parameters grid = expand.grid(nrounds = 100, max_depth = 6, eta = 0

Feature selection with caret rfe and training with another method

阅读更多关于 Feature selection with caret rfe and training with another method

Right now, I'm trying to use Caret rfe function to perform the feature selection, because I'm in a situation with p>>n and most regression techniques that don't involve some sort of regularisation can't be used well. I already used a few techniques with regularisation (Lasso), but what I want to try now is reduce my number of feature so that I'm able to run, at least decently, any kind of regression algorithm on it. control <- rfeControl(functions=rfFuncs, method="cv", number=5) model <- rfe(trainX, trainY, rfeControl=control) predict(model, testX) Right now, if I do it like this, a feature

Caret::train - Values Not Imputed

阅读更多关于 Caret::train - Values Not Imputed

问题 I am trying to impute values by passing "knnImpute" to the preProcess argument of Caret's train() method. Based on the following example, it appears that the values are not imputed, remain as NA and are then ignored. What am I doing wrong? Any help is much appreciated. library("caret") set.seed(1234) data(iris) # mark 8 of the cells as NA, so they can be imputed row <- sample (1:nrow (iris), 8) iris [row, 1] <- NA # split test vs training train.index <- createDataPartition (y = iris[,5], p =

Working with text classification and big sparse matrices in R

阅读更多关于 Working with text classification and big sparse matrices in R

I'm working on a text multi-class classification project and I need to build the document / term matrices and train and test in R language. I already have datasets that don't fit in the limited dimensionality of the base matrix class in R and would need to build big sparse matrices to be able to classify for example, 100k tweets. I am using the quanteda package, as it has been for now more useful and reliable than the package tm , where creating a DocumentTermMatrix with a dictionary, makes the process incredibly memory hungry with small datasets. Currently, as I said, I use quanteda to build