r-caret

R caret model evaluation with alternate performance metric

百般思念 提交于 2019-12-22 05:09:09
问题 I'm using R's caret package to do some grid search and model evaluation. I have a custom evaluation metric that is a weighted average of absolute error. Weights are assigned at the observation level. X <- c(1,1,2,0,1) #feature 1 w <- c(1,2,2,1,1) #weights Y <- 1:5 #target, continuous #assume I run a model using X as features and Y as target and get a vector of predictions mymetric <- function(predictions, target, weights){ v <- sum(abs(target-predictions)*weights)/sum(weights) return(v) }

Error: Package “ggplot2” could not be found, when loading the caret package

不问归期 提交于 2019-12-22 05:06:39
问题 When I install caret with. install.packages("caret", dependencies=c("Depends", "Suggests")) library(caret) ## Loading required package: lattice ## Loading required package: ggplot2 Error in LoadNamespace(i, c(lib.loc, .libPaths()), versionCheck=vI[[i]]): there is no package called 'digest' Error: package 'ggplot2' could not be loaded. So I resolve the issue with the package digest by installing caret using this code and what do I get again: install.packages("caret", dep="TRUE") library(caret)

Parallel execution of train in caret fails with function not found

三世轮回 提交于 2019-12-22 04:16:28
问题 yesterday I updated my R packages and since then parallel execution of the train function fails. It seems like some functions that are called from within the workers are not available. These functions are at least flatTable and probFunction. I experiencing this issues on my production machine, and was able to reproduce it on a clean Windows 7 x64 VM. I added a minimal working example below. Dear users of stackoverflow: Any help is appreciated! # R 3.0.2 x64, RStudio Version 0.98.490, Windows

Difference between predict(model) and predict(model$finalModel) using caret for classification in R

人走茶凉 提交于 2019-12-22 04:12:33
问题 Whats the difference between predict(rf, newdata=testSet) and predict(rf$finalModel, newdata=testSet) i train the model with preProcess=c("center", "scale") tc <- trainControl("repeatedcv", number=10, repeats=10, classProbs=TRUE, savePred=T) rf <- train(y~., data=trainingSet, method="rf", trControl=tc, preProc=c("center", "scale")) and i receive 0 true positives when i run it on a centered and scaled testSet testSetCS <- testSet xTrans <- preProcess(testSetCS) testSetCS<- predict(xTrans,

Working with text classification and big sparse matrices in R

不问归期 提交于 2019-12-21 22:22:49
问题 I'm working on a text multi-class classification project and I need to build the document / term matrices and train and test in R language. I already have datasets that don't fit in the limited dimensionality of the base matrix class in R and would need to build big sparse matrices to be able to classify for example, 100k tweets. I am using the quanteda package, as it has been for now more useful and reliable than the package tm , where creating a DocumentTermMatrix with a dictionary, makes

Selecting tuning parameters with caret using standard deviation of custom metric

五迷三道 提交于 2019-12-21 20:22:35
问题 I'm using caret with custom fitting metric, but I need to maximize not just this metric but lower bound of it's confidence interval. So I'd like to maximize something like mean(metric) - k * stddev(metric) . I know how to do this manually, but is there a way to tell caret to automatically select best parameters using this function? 回答1: Yes, you can define your own selection metric through the "summaryFunction" parameter of your "trainControl" object and then with the "metric" parameter of

How to custom a model in CARET to perform PLS-[Classifer] two-step classificaton model?

你离开我真会死。 提交于 2019-12-21 05:10:21
问题 This question is a continuation of the same thread here. Below is a minimal working example taken from this book: Wehrens R. Chemometrics with R multivariate data analysis in the natural sciences and life sciences. 1st edition. Heidelberg; New York: Springer. 2011. (page 250). The example was taken from this book and its package ChemometricsWithR . It highlighted some pitfalls when modeling using cross-validation techniques. The Aim: A cross-validated methodology using the same set of

How to change metrics using the library(caret)?

送分小仙女□ 提交于 2019-12-21 03:42:22
问题 I would like to change the metric from RMSE to RMSLE using the caret library Given some sample data: ivar1<-rnorm(500, mean = 3, sd = 1) ivar2<-rnorm(500, mean = 4, sd = 1) ivar3<-rnorm(500, mean = 5, sd = 1) ivar4<-rnorm(500, mean = 4, sd = 1) dvar<-rpois(500, exp(3+ 0.1*ivar1 - 0.25*ivar2)) data<-data.frame(dvar,ivar4,ivar3,ivar2,ivar1) ctrl <- rfeControl(functions=rfFuncs, method="cv", repeats = 5, verbose = FALSE, number=5) model <- rfe(data[,2:4], data[,1], sizes=c(1:4), rfeControl=ctrl)

Different results with formula and non-formula for caret training

微笑、不失礼 提交于 2019-12-21 02:36:24
问题 I noticed that using formula and non-formula methods in caret while training produces different results. Also, the time taken for formula method is almost 10x the time taken for the non-formula method. Is this expected ? > z <- data.table(c1=sample(1:1000,1000, replace=T), c2=as.factor(sample(LETTERS, 1000, replace=T))) # SYSTEM TIME WITH FORMULA METHOD # ------------------------------- > system.time(r <- train(c1 ~ ., z, method="rf", importance=T)) user system elapsed 376.233 9.241 18.190 >

Parallel processing with xgboost and caret

青春壹個敷衍的年華 提交于 2019-12-20 19:34:39
问题 I want to parallelize the model fitting process for xgboost while using caret. From what I have seen in xgboost's documentation, the nthread parameter controls the number of threads to use while fitting the models, in the sense of, building the trees in a parallel way. Caret's train function will perform parallelization in the sense of, for example, running a process for each iteration in a k-fold CV. Is this understanding correct, if yes, is it better to: Register the number of cores (for