rpart

What does the rpart “Error in as.character(x) : cannot coerce type 'builtin' to vector of type 'character' ” message mean?

核能气质少年 提交于 2019-12-23 19:53:03
问题 I've been banging my head against rpart for a few days now (trying to make classification trees for this dataset that I have), and I think it's time to ask a lifeline at this point :-) I'm sure it's something silly that I'm not seeing, but here's what I've been doing: EuropeWater <- read.csv(file=paste("/Users/artessaniccola/Documents/", "Magic Briefcase/CityTypology/Europe_water.csv",sep="")) library(rpart) attach(EuropeWater) names(EuropeWater) [1] "City" "waterpercapita_m3" "water_class"

rpart plot text shorter

…衆ロ難τιáo~ 提交于 2019-12-22 06:37:48
问题 I am using the prp function from the rpart.plot package to plot a tree. For categorical data like states, it gives a really long list of variables and makes it less readable. Is there any way to wrap text to two or more lines if exceeds some length? 回答1: Here's an example that wraps long split labels over multiple lines. The maximum length of each line is 25 characters. Change the 25 to suit your purposes. (This example is derived from Section 6.1 in the rpart.plot vignette.) tree <- rpart

How do I interpret rpart splits on factor variables when building classification trees in R?

无人久伴 提交于 2019-12-21 16:19:35
问题 If the factor variable is Climate, with 4 possible values: Tropical, Arid, Temperate, Snow, and a node in my rpart tree is labeled as "Climate:ab", what is the split? 回答1: I assume you use standard way to plot tree which is plot(f) text(f) As you can read in help to text.rpart , argument pretty on default factor variables are presented as letters, so a means levels(Climate)[1] and it means that on left node are observation with Climate==levels(Climate)[1] and on right the others. You could

Caret train method complains Something is wrong; all the RMSE metric values are missing

亡梦爱人 提交于 2019-12-19 17:45:17
问题 On numerous occasions I've been getting this error when trying to fit a gbm or rpart model. Finally I was able to reproduce it consistently using publicly available data. I have noticed that this error happens when using CV (or repeated cv). When I don't use any fit control I don't get this error. Can some shed some light one why I keep getting error consistently. fitControl= trainControl("repeatedcv", repeats=5) ds = read.csv("http://www.math.smith.edu/r/data/help.csv") ds$sub = as.factor(ds

How to get terminal nodes for a new observation from an rpart object?

廉价感情. 提交于 2019-12-19 04:15:10
问题 Say I have head(kyphosis) inTrain <- sample(1:nrow(kyphosis), 45, replace = F) TRAIN_KYPHOSIS <- kyphosis[inTrain,] TEST_KYPHOSIS <- kyphosis[-inTrain,] (kyph_tree <- rpart(Number ~ ., data = TRAIN_KYPHOSIS)) How to get the terminal node from the fitted object for each observation in TEST_KYPHOSIS ? How do I get a summary, such as the deviance and the predicted value from the terminal node which each test observation maps to? 回答1: rpart actually has this functionality but it's not exposed

Search for corresponding node in a regression tree using rpart

谁都会走 提交于 2019-12-18 04:15:33
问题 I'm pretty new to R and I'm stuck with a pretty dumb problem. I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting. Thanks to R the calibration part is easy to do and easy to control. #the package rpart is needed library(rpart) # Loading of a big data file used for calibration my_data <- read.csv("my_file.csv", sep=",", header=TRUE) # Regression tree calibration tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + Attribute4

Warning message: “missing values in resampled performance measures” in caret train() using rpart

谁都会走 提交于 2019-12-17 15:56:06
问题 I am using the caret package to train a model with "rpart" package; tr = train(y ~ ., data = trainingDATA, method = "rpart") Data has no missing values or NA's, but when running the command a warning message comes up; Warning message: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. Does anyone know (or could point me to where to find an answer) what does this warning mean? I know it is telling me that there

Apply weights in rpart model gives error

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-14 02:28:28
问题 I'm using the rpart package to fit some models, like this: fitmodel = function(formula, data, w) { fit = rpart(formula, data, weights = w) } Call the custom function fit = fitmodel(y ~ x1 + x2, data, w) This causes the error: Error in eval(expr, envir, enclos) : object 'w' not found Then i decided to use fitmodel = function(formula, data, w) { data$w = w fit = rpart(formula, data, weights = w) } This works, but there's another problem: This will work fit = fitmodel(y ~ x1 + x2, data, w) This

The tuning parameter in “glm” vs “rf”

给你一囗甜甜゛ 提交于 2019-12-13 07:43:25
问题 I am trying to build a classification model using method = "glm" in train . When I use method = "rpart" it works fine but when I switch to method = "glm" then it gives me an error saying The tuning parameter grid should have columns parameter I tried using cpGrid = data.frame(.0001) also cpGrid = data.frame(expand.grid(.cp = seq(.0001, .09, .001))) But both throwing an error. Below is my initial code numFolds = trainControl(method = "cv", number = 10, repeats = 3) cpGrid = expand.grid(.cp =

Is rpart automatic pruning?

孤街浪徒 提交于 2019-12-12 10:44:44
问题 Is rpart automatic pruning? The decision tree produced by rpart is much more levels than that produced by Oracle Data Mining which has the automatic pruning. 回答1: No, but the defaults for the fitting function may stop splitting "early" (for some definition of "early"). See ?rpart.control for the parameters you can tweak. In particular, see the argument minsplit and minbucket in that help file. These are stopping rules that will prevent any node being split if those conditions are not met. You