r-caret | 易学教程

How to solve “The data cannot have more levels than the reference” error when using confusioMatrix?

阅读更多关于 How to solve “The data cannot have more levels than the reference” error when using confusioMatrix?

问题 I'm using R programming. I divided the data as train & test for predicting accuracy. This is my code: library("tree") credit<-read.csv("C:/Users/Administrator/Desktop/german_credit (2).csv") library("caret") set.seed(1000) intrain<-createDataPartition(y=credit$Creditability,p=0.7,list=FALSE) train<-credit[intrain, ] test<-credit[-intrain, ] treemod<-tree(Creditability~. , data=train) plot(treemod) text(treemod) cv.trees<-cv.tree(treemod,FUN=prune.tree) plot(cv.trees) prune.trees<-prune.tree

Caret Binary Classification with RMSE

阅读更多关于 Caret Binary Classification with RMSE

问题 Is there a way to get caret to use RMSE with a binary classification problem? If you try to use metric = "RMSE" with a classification problem you will receive the message: Error in train.default(x, y, weights = w, ...) : Metric RMSE not applicable for classification models Which makes sense. But is there a way to define a custom metric? For example, if your outcome is 0 or 1 , you can define the error as outcome - p where p is the probability predicted by the model. EDIT ====================

How should I get the coefficients of Lasso Model?

阅读更多关于 How should I get the coefficients of Lasso Model?

问题 Here is my code: library(MASS) library(caret) df <- Boston set.seed(3721) cv.10.folds <- createFolds(df$medv, k = 10) lasso_grid <- expand.grid(fraction=c(1,0.1,0.01,0.001)) lasso <- train(medv ~ ., data = df, preProcess = c("center", "scale"), method ='lasso', tuneGrid = lasso_grid, trControl= trainControl(method = "cv", number = 10, index = cv.10.folds)) lasso Unlike linear model, I cannot find the coefficients of Lasso regression model from summary(lasso). How should I do that? Or maybe I

Difference in average AUC computation using ROCR and pROC (R)

阅读更多关于 Difference in average AUC computation using ROCR and pROC (R)

问题 I am working with cross-validation data (10-fold repeated 5 times) from a SVM-RFE model generated with the caret package. I know that caret package works with pROC package when computing metrics but I need to use ROCR package in order to obtain the average ROC. However, I noticed that the average AUC values were not the same when using each package, so I am not sure if I should use both packages indistinctively. The code I used to prove that is: predictions_NG3<-list() labels_NG3<-list()

R: set.seed() results don't match if caret package loaded

阅读更多关于 R: set.seed() results don't match if caret package loaded

问题 I am using createFolds() in R (version: 3.3.0) to create train/test partitions. To make results reproducible, I used set.seed() with a seed value of 10. As expected, the results (generated folds) were reproducible. But once I loaded caret package just after setting the seed. And then used the createFolds function, I found that the created folds were different (although still reproducible). Specifically, the created folds differ in the following two cases: Case 1: library(caret) set.seed(10)

caret: using random forest and include cross-validation

阅读更多关于 caret: using random forest and include cross-validation

问题 I used the caret package to train a random forest, including repeated cross-validation. I’d like to know whether the OOB, as in the original RF by Breiman, is used or whether this is replaced by the cross-validation. If it is replaced, do I have the same advantages as described in Breiman 2001, like increased accuracy by reducing the correlation between input data? As OOB is drawn with replacement and CV is drawn without replacement, are both procedures comparable? What is the OOB estimate of

caret: using random forest and include cross-validation

阅读更多关于 caret: using random forest and include cross-validation

“length of 'dimnames' [1] not equal to array extent” error in linear regression summary in r

阅读更多关于 “length of 'dimnames' [1] not equal to array extent” error in linear regression summary in r

问题 I'm running a straightforward linear regression model fit on the following dataframe: > str(model_data_rev) 'data.frame': 128857 obs. of 12 variables: $ ENTRY_4 : num 186 218 208 235 256 447 471 191 207 250 ... $ ENTRY_8 : num 724 769 791 777 707 237 236 726 773 773 ... $ ENTRY_12: num 2853 2989 3174 3027 3028 ... $ ENTRY_16: num 2858 3028 3075 2992 3419 ... $ ENTRY_20: num 7260 7188 7587 7560 7165 ... $ EXIT_4 : num 70 82 105 114 118 204 202 99 73 95 ... $ EXIT_8 : num 1501 1631 1594 1576

r package caret-Print iteration when using parallel

阅读更多关于 r package caret-Print iteration when using parallel

问题 Is there anyway we could print iteration when using the caret::train function in parallel? I know there is option called verbose but it seems it doesn't print anything if I use multicore. 回答1: I've found a solution. All we need is to register cores via makeCluster function. library(doSNOW) cl <- makeCluster(30, outfile="") registerDoSNOW(cl) In this way, the log will be printed in console. I've tested on regular R/ Rstudio/ Rserver in mac/window/ubuntu (even AWS) For example, iris <- iris[1

Can't install the caret package in R (in my Linux machine)

阅读更多关于 Can't install the caret package in R (in my Linux machine)

问题 I am facing the following errors while trying to install the caret package in R. g++: error: /tmp/Rtmp2Tos7n/R.INSTALL2e6e30153a74/nloptr/nlopt-2.4.2/lib/libnlopt_cxx.a: No such file or directory make: *** [nloptr.so] Error 1 ERROR: compilation failed for package ‘nloptr’ * removing ‘/rmt/csfiles/pgrads/mava290/R/x86_64-suse-linux-gnu-library/3.1/nloptr’ Warning in install.packages : installation of package ‘nloptr’ had non-zero exit status ERROR: dependency ‘nloptr’ is not available for