rpart

How to apply weights in rpart?

别说谁变了你拦得住时间么 提交于 2019-12-10 10:57:11
问题 I have this data on houses from the Kaggle practice competition and I'm using rpart to train a simple first model to predict the sale price. The model is not correctly identifying sales where the sale condition was abnormal or a down payment. Therefore, I'd like to increase the importance of this variable which is obviously overlooked in the model. I'm assuming this is done by using the "weights" parameter but how is this parameter used? How can I pinpoint which variables I want to have a

rpart node assignment

让人想犯罪 __ 提交于 2019-12-09 23:00:54
问题 Is it possible to extract the node assignment for a fitted rpart tree? What about when I apply the model to new data? The idea is that I would like to use the nodes as a way to cluster my data. In other packages (e.g. SPSS), I can save the predicted class, probabilities, and node number for further analysis. Given how powerful R can be, I imagine there is a simple solution to this. 回答1: Try using the partykit package: library(rpart) z.auto <- rpart(Mileage ~ Weight, car.test.frame) library

Why do results using caret::train(…, method = “rpart”) differ from rpart::rpart(…)?

为君一笑 提交于 2019-12-09 15:57:19
问题 I'm taking part in the Coursera Practical Machine Learning course, and the coursework requires building predictive models using this dataset. After splitting the data into training and testing datasets, based on the outcome of interest (herewith labelled y , but is in fact the classe variable in the dataset): inTrain <- createDataPartition(y = data$y, p = 0.75, list = F) training <- data[inTrain, ] testing <- data[-inTrain, ] I have tried 2 different methods: modFit <- caret::train(y ~ .,

Getting the observations in a rpart's node (i.e.: CART)

偶尔善良 提交于 2019-12-08 22:27:02
问题 I would like to inspect all the observations that reached some node in an rpart decision tree. For example, in the following code: fit <- rpart(Kyphosis ~ Age + Start, data = kyphosis) fit n= 81 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 81 17 absent (0.79012346 0.20987654) 2) Start>=8.5 62 6 absent (0.90322581 0.09677419) 4) Start>=14.5 29 0 absent (1.00000000 0.00000000) * 5) Start< 14.5 33 6 absent (0.81818182 0.18181818) 10) Age< 55 12 0 absent (1.00000000 0

How to get percentages from decision tree for each node

倖福魔咒の 提交于 2019-12-07 19:17:45
问题 This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 4 years ago . How could I create a table that includes the percentages for each node in the plot below? library(rpart) library(rattle) library(rpart.plot) library(RColorBrewer) fit <- rpart(Species ~ ., data=iris, method="class") fancyRpartPlot(fit) It results in this plot: I would like to output a table with species as the first column and the associated percent at each node in a second

r caret predict returns fewer output than input

ε祈祈猫儿з 提交于 2019-12-07 06:36:30
问题 I used caret to train an rpart model below. trainIndex <- createDataPartition(d$Happiness, p=.8, list=FALSE) dtrain <- d[trainIndex, ] dtest <- d[-trainIndex, ] fitControl <- trainControl(## 10-fold CV method = "repeatedcv", number=10, repeats=10) fitRpart <- train(Happiness ~ ., data=dtrain, method="rpart", trControl = fitControl) testRpart <- predict(fitRpart, newdata=dtest) dtest contains 1296 observations, so I expected testRpart to produce a vector of length 1296. Instead it's 1077 long,

rpart: Computational time for categorical vs continuous regressors

不想你离开。 提交于 2019-12-07 04:46:50
问题 i am currently using the rpart package to fit a regression tree to a data with relatively few observations and several thousand categorical predictors taking two possible values. from testing the package out on smaller data i know that in this instance it doesn't matter whether i declare the regressors as categorical (i.e. factors) or leave them as they are (they are coded as +/-1). however, i would still like to understand why passing my explanatory variables as factors significantly slows

How to apply weights in rpart?

北战南征 提交于 2019-12-06 12:53:18
I have this data on houses from the Kaggle practice competition and I'm using rpart to train a simple first model to predict the sale price. The model is not correctly identifying sales where the sale condition was abnormal or a down payment. Therefore, I'd like to increase the importance of this variable which is obviously overlooked in the model. I'm assuming this is done by using the "weights" parameter but how is this parameter used? How can I pinpoint which variables I want to have a higher weight? From the documentation : weights optional case weights. cost a vector of non-negative costs

Using ordinal variables in rpart and caret without converting to dummy categorical variables

核能气质少年 提交于 2019-12-06 07:52:35
问题 I am trying to create an ordinal regression tree in R using rpart , with the predictors mostly being ordinal data, stored as factor in R. When I created the tree using rpart , I get something like this: where the values are the factor values (E.g. A170 has labels ranging from -5 to 10). However, when I use caret to train the data using rpart , when I extract the final model, the tree no longer has ordinal predictors. See below for a sample output tree As you see above, it seems the ordinal

How to get percentages from decision tree for each node

纵饮孤独 提交于 2019-12-06 07:42:56
This question was migrated from Cross Validated because it can be answered on Stack Overflow. Migrated 4 years ago . How could I create a table that includes the percentages for each node in the plot below? library(rpart) library(rattle) library(rpart.plot) library(RColorBrewer) fit <- rpart(Species ~ ., data=iris, method="class") fancyRpartPlot(fit) It results in this plot: I would like to output a table with species as the first column and the associated percent at each node in a second column. A second iteration of the table would exclude the first node (100%) and also remove duplicates by