r-caret | 易学教程

R caret model evaluation with alternate performance metric

阅读更多关于 R caret model evaluation with alternate performance metric

问题 I'm using R's caret package to do some grid search and model evaluation. I have a custom evaluation metric that is a weighted average of absolute error. Weights are assigned at the observation level. X <- c(1,1,2,0,1) #feature 1 w <- c(1,2,2,1,1) #weights Y <- 1:5 #target, continuous #assume I run a model using X as features and Y as target and get a vector of predictions mymetric <- function(predictions, target, weights){ v <- sum(abs(target-predictions)*weights)/sum(weights) return(v) }

Error: Package “ggplot2” could not be found, when loading the caret package

阅读更多关于 Error: Package “ggplot2” could not be found, when loading the caret package

问题 When I install caret with. install.packages("caret", dependencies=c("Depends", "Suggests")) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 Error in LoadNamespace(i, c(lib.loc, .libPaths()), versionCheck=vI[[i]]): there is no package called 'digest' Error: package 'ggplot2' could not be loaded. So I resolve the issue with the package digest by installing caret using this code and what do I get again: install.packages("caret", dep="TRUE") library(caret)

Parallel execution of train in caret fails with function not found

阅读更多关于 Parallel execution of train in caret fails with function not found

问题 yesterday I updated my R packages and since then parallel execution of the train function fails. It seems like some functions that are called from within the workers are not available. These functions are at least flatTable and probFunction. I experiencing this issues on my production machine, and was able to reproduce it on a clean Windows 7 x64 VM. I added a minimal working example below. Dear users of stackoverflow: Any help is appreciated! # R 3.0.2 x64, RStudio Version 0.98.490, Windows

Difference between predict(model) and predict(model$finalModel) using caret for classification in R

阅读更多关于 Difference between predict(model) and predict(model$finalModel) using caret for classification in R

问题 Whats the difference between predict(rf, newdata=testSet) and predict(rf$finalModel, newdata=testSet) i train the model with preProcess=c("center", "scale") tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T) rf <- train(y~., data=trainingSet, method="rf", trControl=tc, preProc=c("center", "scale")) and i receive 0 true positives when i run it on a centered and scaled testSet testSetCS <- testSet xTrans <- preProcess(testSetCS) testSetCS<- predict(xTrans,

Working with text classification and big sparse matrices in R

阅读更多关于 Working with text classification and big sparse matrices in R

问题 I'm working on a text multi-class classification project and I need to build the document / term matrices and train and test in R language. I already have datasets that don't fit in the limited dimensionality of the base matrix class in R and would need to build big sparse matrices to be able to classify for example, 100k tweets. I am using the quanteda package, as it has been for now more useful and reliable than the package tm , where creating a DocumentTermMatrix with a dictionary, makes

Selecting tuning parameters with caret using standard deviation of custom metric

阅读更多关于 Selecting tuning parameters with caret using standard deviation of custom metric

问题 I'm using caret with custom fitting metric, but I need to maximize not just this metric but lower bound of it's confidence interval. So I'd like to maximize something like mean(metric) - k * stddev(metric) . I know how to do this manually, but is there a way to tell caret to automatically select best parameters using this function? 回答1: Yes, you can define your own selection metric through the "summaryFunction" parameter of your "trainControl" object and then with the "metric" parameter of

How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?

阅读更多关于 How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?

问题 This question is a continuation of the same thread here. Below is a minimal working example taken from this book: Wehrens R. Chemometrics with R multivariate data analysis in the natural sciences and life sciences. 1st edition. Heidelberg; New York: Springer. 2011. (page 250). The example was taken from this book and its package ChemometricsWithR . It highlighted some pitfalls when modeling using cross-validation techniques. The Aim: A cross-validated methodology using the same set of

How to change metrics using the library(caret)?

阅读更多关于 How to change metrics using the library(caret)?

问题 I would like to change the metric from RMSE to RMSLE using the caret library Given some sample data: ivar1<-rnorm(500, mean = 3, sd = 1) ivar2<-rnorm(500, mean = 4, sd = 1) ivar3<-rnorm(500, mean = 5, sd = 1) ivar4<-rnorm(500, mean = 4, sd = 1) dvar<-rpois(500, exp(3+ 0.1*ivar1 - 0.25*ivar2)) data<-data.frame(dvar,ivar4,ivar3,ivar2,ivar1) ctrl <- rfeControl(functions=rfFuncs, method="cv", repeats = 5, verbose = FALSE, number=5) model <- rfe(data[,2:4], data[,1], sizes=c(1:4), rfeControl=ctrl)

Different results with formula and non-formula for caret training

阅读更多关于 Different results with formula and non-formula for caret training

问题 I noticed that using formula and non-formula methods in caret while training produces different results. Also, the time taken for formula method is almost 10x the time taken for the non-formula method. Is this expected ? > z <- data.table(c1=sample(1:1000,1000, replace=T), c2=as.factor(sample(LETTERS, 1000, replace=T))) # SYSTEM TIME WITH FORMULA METHOD # ------------------------------- > system.time(r <- train(c1 ~ ., z, method="rf", importance=T)) user system elapsed 376.233 9.241 18.190 >

Parallel processing with xgboost and caret

阅读更多关于 Parallel processing with xgboost and caret

问题 I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's train function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to: Register the number of cores (for