decision-tree

Getting the observations in a rpart's node (i.e.: CART)

≯℡__Kan透↙ 提交于 2019-11-29 07:35:05
I would like to inspect all the observations that reached some node in an rpart decision tree. For example, in the following code: fit <- rpart(Kyphosis ~ Age + Start, data = kyphosis) fit n= 81 node), split, n, loss, yval, (yprob) * denotes terminal node 1) root 81 17 absent (0.79012346 0.20987654) 2) Start>=8.5 62 6 absent (0.90322581 0.09677419) 4) Start>=14.5 29 0 absent (1.00000000 0.00000000) * 5) Start< 14.5 33 6 absent (0.81818182 0.18181818) 10) Age< 55 12 0 absent (1.00000000 0.00000000) * 11) Age>=55 21 6 absent (0.71428571 0.28571429) 22) Age>=111 14 2 absent (0.85714286 0.14285714

Search for corresponding node in a regression tree using rpart

谁说我不能喝 提交于 2019-11-29 04:31:40
I'm pretty new to R and I'm stuck with a pretty dumb problem. I'm calibrating a regression tree using the rpart package in order to do some classification and some forecasting. Thanks to R the calibration part is easy to do and easy to control. #the package rpart is needed library(rpart) # Loading of a big data file used for calibration my_data <- read.csv("my_file.csv", sep=",", header=TRUE) # Regression tree calibration tree <- rpart(Ratio ~ Attribute1 + Attribute2 + Attribute3 + Attribute4 + Attribute5, method="anova", data=my_data, control=rpart.control(minsplit=100, cp=0.0001)) After

How to access weighting of indiviual decision trees in xgboost?

泄露秘密 提交于 2019-11-28 18:01:16
I'm using xgboost for ranking with param = {'objective':'rank:pairwise', 'booster':'gbtree'} As I understand gradient boosting works by calculating the weighted sum of the learned decision trees. How can I access the weights that are assigned to each learned booster? I wanted to try to post-process the weights after training to speed up the prediction step but I don't know how to get the individual weights. When using dump_model() , the different decision trees can be seen in the created file but no weighting is stored there. In the API I haven't found a suitable function. Or can I calculate

how to explain the decision tree from scikit-learn

时光毁灭记忆、已成空白 提交于 2019-11-28 17:54:38
I have two problems with understanding the result of decision tree from scikit-learn. For example, this is one of my decision trees: My question is that how I can use the tree? The first question is that: if a sample satisfied the condition, then it goes to the LEFT branch (if exists), otherwise it goes RIGHT . In my case, if a sample with X[7] > 63521.3984. Then the sample will go to the green box. Correct? The second question is that: when a sample reaches the leaf node, how can I know which category it belongs? In this example, I have three categories to classify. In the red box, there are

Help Understanding Cross Validation and Decision Trees

喜欢而已 提交于 2019-11-28 16:13:48
I've been reading up on Decision Trees and Cross Validation, and I understand both concepts. However, I'm having trouble understanding Cross Validation as it pertains to Decision Trees. Essentially Cross Validation allows you to alternate between training and testing when your dataset is relatively small to maximize your error estimation. A very simple algorithm goes something like this: Decide on the number of folds you want (k) Subdivide your dataset into k folds Use k-1 folds for a training set to build a tree. Use the testing set to estimate statistics about the error in your tree. Save

How do I solve overfitting in random forest of Python sklearn?

你离开我真会死。 提交于 2019-11-28 16:01:59
问题 I am using RandomForestClassifier implemented in python sklearn package to build a binary classification model. The below is the results of cross validations: Fold 1 : Train: 164 Test: 40 Train Accuracy: 0.914634146341 Test Accuracy: 0.55 Fold 2 : Train: 163 Test: 41 Train Accuracy: 0.871165644172 Test Accuracy: 0.707317073171 Fold 3 : Train: 163 Test: 41 Train Accuracy: 0.889570552147 Test Accuracy: 0.585365853659 Fold 4 : Train: 163 Test: 41 Train Accuracy: 0.871165644172 Test Accuracy: 0

How to compute error rate from a decision tree?

流过昼夜 提交于 2019-11-28 15:45:33
问题 Does anyone know how to calculate the error rate for a decision tree with R? I am using the rpart() function. 回答1: Assuming you mean computing error rate on the sample used to fit the model, you can use printcp() . For example, using the on-line example, > library(rpart) > fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis) > printcp(fit) Classification tree: rpart(formula = Kyphosis ~ Age + Number + Start, data = kyphosis) Variables actually used in tree construction: [1] Age Start

How do I find which attributes my tree splits on, when using scikit-learn?

笑着哭i 提交于 2019-11-28 15:18:39
问题 I have been exploring scikit-learn, making decision trees with both entropy and gini splitting criteria, and exploring the differences. My question, is how can I "open the hood" and find out exactly which attributes the trees are splitting on at each level, along with their associated information values, so I can see where the two criterion make different choices? So far, I have explored the 9 methods outlined in the documentation. They don't appear to allow access to this information. But

how extraction decision rules of random forest in python

感情迁移 提交于 2019-11-28 14:25:33
I have one question though. I heard from someone that in R, you can use extra packages to extract the decision rules implemented in RF, I try to google the same thing in python but without luck, if there is any help on how to achieve that. thanks in advance! Assuming that you use sklearn RandomForestClassifier you can find the invididual decision trees as .estimators_ . Each tree stores the decision nodes as a number of NumPy arrays under tree_ . Here is some example code which just prints each node in order of the array. In a typical application one would instead traverse by following the

C5.0 decision tree - c50 code called exit with value 1

♀尐吖头ヾ 提交于 2019-11-28 12:05:26
I am getting the following error c50 code called exit with value 1 I am doing this on the titanic data available from Kaggle # Importing datasets train <- read.csv("train.csv", sep=",") # this is the structure str(train) Output :- 'data.frame': 891 obs. of 12 variables: $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ... $ Survived : int 0 1 1 1 0 0 0 0 1 1 ... $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ... $ Name : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ... $ Sex : Factor w/ 2 levels "female","male": 2 1 1 1 2 2 2 2 1 1 ... $ Age : num 22 38 26 35 35 NA 54 2 27