rpart

Data Prediction using Decision Tree of rpart

妖精的绣舞 提交于 2021-02-18 22:28:47
问题 I am using R to classify a data-frame called 'd' containing data structured like below: The data has 576666 rows and the column "classLabel" has a factor of 3 levels: ONE, TWO, THREE. I am making a decision tree using rpart: fitTree = rpart(d$classLabel ~ d$tripduration + d$from_station_id + d$gender + d$birthday) And I want to predict the values for the "classLabel" for newdata : newdata = data.frame( tripduration=c(345,244,543,311), from_station_id=c(60,28,100,56), gender=c("Male","Female",

Optimising caret for sensitivity still seems to optimise for ROC

梦想与她 提交于 2021-01-20 08:06:46
问题 I'm trying to maximise sensitivity in my model selection in caret using rpart . To this end, I tried to replicate the method given here (scroll down to the example with the user-defined function FourStat) caret's github page # create own function so we can use "sensitivity" as our metric to maximise: Sensitivity.fc <- function (data, lev = levels(data$obs), model = NULL) { out <- c(twoClassSummary(data, lev = levels(data$obs), model = NULL)) c(out, Sensitivity = out["Sens"]) } rpart_caret_fit

Optimising caret for sensitivity still seems to optimise for ROC

不羁的心 提交于 2021-01-20 08:05:37
问题 I'm trying to maximise sensitivity in my model selection in caret using rpart . To this end, I tried to replicate the method given here (scroll down to the example with the user-defined function FourStat) caret's github page # create own function so we can use "sensitivity" as our metric to maximise: Sensitivity.fc <- function (data, lev = levels(data$obs), model = NULL) { out <- c(twoClassSummary(data, lev = levels(data$obs), model = NULL)) c(out, Sensitivity = out["Sens"]) } rpart_caret_fit

Wrong labels in rpart tree

限于喜欢 提交于 2020-04-16 01:59:45
问题 I am running into some labels issue when using rpart in R. Here's my situation. I'm working on a dataset with categorical variables, here's an extract of my data head(Dataset) Entity IL CP TD Budget 2 1 3 2 250 5 2 2 1 663 6 1 2 3 526 2 3 1 2 522 when I plot my decision tree adding the labels, using plot(tree) text(tree) I get wrong labels : for Entity, I get "abcd" Why do I get that and how can I fix that ? Thank you for your help 回答1: By default plot.rpart will just label the levels of

Error in eval(predvars, data, env) : object 'Rm' not found

风格不统一 提交于 2020-03-19 04:01:08
问题 dataset = read.csv('dataset/housing.header.binary.txt') dataset1 = dataset[6] #higest positive correlation dataset2 = dataset[13] #lowest negative correlation dependentVal= dataset[14] #dependent value new_dataset = cbind(dataset1,dataset2, dependentVal) # new matrix #split dataset #install.packages('caTools') library(caTools) set.seed(123) #this is needed to garantee that every run will produce the same output split = sample.split(new_dataset, SplitRatio = 0.75) train_set = subset(new

How to prune a tree in R?

给你一囗甜甜゛ 提交于 2020-01-20 03:17:28
问题 I'm doing a classification using rpart in R. The tree model is trained by: > tree <- rpart(activity ~ . , data=trainData) > pData1 <- predict(tree, testData, type="class") The accuracy for this tree model is: > sum(testData$activity==pData1)/length(pData1) [1] 0.8094276 I read a tutorial to prune the tree by cross validation: > ptree <- prune(tree,cp=tree$cptable[which.min(tree$cptable[,"xerror"]),"CP"]) > pData2 <- predict(ptree, testData, type="class") The accuracy rate for the pruned tree

Formulate data for rpart

笑着哭i 提交于 2020-01-17 00:44:11
问题 Concatenate columns name of a list to prepare a formula for rpart ? Just wanted to concatenate the names(log_data), log_data is a list of 60 vectors distinct vectors, so I just want their column names in a format so that I can put them in a formula of rpart in r..... like rpart(A ~ B + C + D + E ,log_data) , so here I just want to extract formula="A~B+C+D+E" as a whole string where A,B,C,D,E are the columns name which we have to extract from the log_data, or is there any better way to get a

rpart - Find number of leaves that a cp value to pruning a tree would return

萝らか妹 提交于 2019-12-31 07:42:23
问题 I have a requirement where I need to group my categorical variables (having more than 5 category values) into 5 groups based on their association with my continuous variable. To achieve this I am using rpart with " annova " method. So for example my categorical variable is type having codes 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 so I want to have 5 groups of this variable. After running the tree inorder to have only 5 groups I need to prune the tree. One way I tried is to use the nsplit from

Get decision tree rule/path pattern for every row of predicted dataset for rpart/ctree package in R

故事扮演 提交于 2019-12-28 02:57:27
问题 I have built a decision tree model in R using rpart and ctree . I also have predicted a new dataset using the built model and got predicted probabilities and classes. However, I would like to extract the rule/path, in a single string, for every observation (in predicted dataset) has followed. Storing this data in tabular format, I can explain prediction with reason in a automated manner without opening R. Which means I want to got following. ObsID Probability PredictedClass PathFollowed 1 0

R rpart: No splits if I remove less important variables

混江龙づ霸主 提交于 2019-12-24 04:56:31
问题 I am trying to understand how rpart works in a project that I am trying to complete. I am relatively new to R but I have a lot of experience using SAS to build a variety of analytical models. First I ran this piece of code mtree1 <- rpart(X17~., data = mydata, method="class", control = rpart.control(minsplit = 20, minbucket = 7, maxdepth = 10, usesurrogate = 2, xval =10 )) I get a tree with X12 as the top split, X10 is the next split on the LHS, X69 on the RHS, and then X68 and X70 on that