decision-tree

chaid regression tree to table conversion in r

大兔子大兔子 提交于 2019-11-28 07:04:09
问题 I used the CHAID package from this link ..It gives me a chaid object which can be plotted..I want a decision table with each decision rule in a column instead of a decision tree. .But i dont understand how to access nodes and paths in this chaid object..Kindly help me.. I followed the procedure given in this link i cant post my data here since it is too long.So i am posting a code which takes the sample dataset provided with chaid to perform the task. copied from help manual of chaid: library

Used Variables in Tree

余生长醉 提交于 2019-11-28 06:07:03
问题 How can i get know which variables are actually used in a constructed tree? model = tree(status~., set.train) I can see the variables if i write: summary(model) tree(formula = status ~ ., data = set.train) Variables actually used in tree construction: [1] "spread1" "MDVP.Fhi.Hz." "DFA" "D2" "RPDE" "MDVP.Shimmer" "Shimmer.APQ5" Number of terminal nodes: 8 Residual mean deviance: 0.04225 = 5.831 / 138 Distribution of residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.9167 0.0000 0.0000 0.0000

extracting predictors from ctree object

落花浮王杯 提交于 2019-11-28 05:28:59
问题 I've checked binary tree class methods, and How to extract tree structure from ctree function? (which was helpful understanding S4 object structure and slots), but it's still unclear how to get to the final predictors of a ctree object. For rpart , I'd use something like extract_preds <- function( tt ){ leaves <- tt$frame$var == '<leaf>' as.character( unique( tt$frame$var[ leaves==F ] ) ) } Is there a similar shortcut available, or do I have to write a recursive function to traverse the ctree

Decision Tree in Matlab

*爱你&永不变心* 提交于 2019-11-28 05:07:06
I saw the help in Matlab, but they have provided an example without explaining how to use the parameters in the 'classregtree' function. Any help to explain the use of 'classregtree' with its parameters will be appreciated. The documentation page of the function classregtree is self-explanatory... Lets go over some of the most common parameters of the classification tree model: x : data matrix, rows are instances, cols are predicting attributes y : column vector, class label for each instance categorical : specify which attributes are discrete type (as opposed to continuous) method : whether

Visualizing decision tree in scikit-learn

帅比萌擦擦* 提交于 2019-11-27 18:00:55
I am trying to design a simple Decision Tree using scikit-learn in Python (I am using Anaconda's Ipython Notebook with Python 2.7.3 on Windows OS) and visualize it as follows: from pandas import read_csv, DataFrame from sklearn import tree from os import system data = read_csv('D:/training.csv') Y = data.Y X = data.ix[:,"X0":"X33"] dtree = tree.DecisionTreeClassifier(criterion = "entropy") dtree = dtree.fit(X, Y) dotfile = open("D:/dtree2.dot", 'w') dotfile = tree.export_graphviz(dtree, out_file = dotfile, feature_names = X.columns) dotfile.close() system("dot -Tpng D:.dot -o D:/dtree2.png")

Why is Random Forest with a single tree much better than a Decision Tree classifier?

我们两清 提交于 2019-11-27 14:48:55
I learn the machine learning with the scikit-learn library. I apply the decision tree classifier and the random forest classifier to my data with this code: def decision_tree(train_X, train_Y, test_X, test_Y): clf = tree.DecisionTreeClassifier() clf.fit(train_X, train_Y) return clf.score(test_X, test_Y) def random_forest(train_X, train_Y, test_X, test_Y): clf = RandomForestClassifier(n_estimators=1) clf = clf.fit(X, Y) return clf.score(test_X, test_Y) Why the result are so much better for the random forest classifier (for 100 runs, with randomly sampling 2/3 of data for the training and 1/3

Get decision tree rule/path pattern for every row of predicted dataset for rpart/ctree package in R

拜拜、爱过 提交于 2019-11-27 14:09:47
I have built a decision tree model in R using rpart and ctree . I also have predicted a new dataset using the built model and got predicted probabilities and classes. However, I would like to extract the rule/path, in a single string, for every observation (in predicted dataset) has followed. Storing this data in tabular format, I can explain prediction with reason in a automated manner without opening R. Which means I want to got following. ObsID Probability PredictedClass PathFollowed 1 0.68 Safe CarAge < 10 & Country = Germany & Type = Compact & Price < 12822.5 2 0.76 Safe CarAge < 10 &

What does `sample_weight` do to the way a `DecisionTreeClassifier` works in sklearn?

随声附和 提交于 2019-11-27 12:41:14
I've read from this documentation that : "Class balancing can be done by sampling an equal number of samples from each class, or preferably by normalizing the sum of the sample weights (sample_weight) for each class to the same value." But, it is still unclear to me how this works. If I set sample_weight with an array of only two possible values, 1 's and 2 's, does this mean that the samples with 2 's will get sampled twice as often as the samples with 1 's when doing the bagging? I cannot think of a practical example for this. Matt Hancock So I spent a little time looking at the sklearn

how to explain the decision tree from scikit-learn

徘徊边缘 提交于 2019-11-27 10:52:58
问题 I have two problems with understanding the result of decision tree from scikit-learn. For example, this is one of my decision trees: My question is that how I can use the tree? The first question is that: if a sample satisfied the condition, then it goes to the LEFT branch (if exists), otherwise it goes RIGHT . In my case, if a sample with X[7] > 63521.3984. Then the sample will go to the green box. Correct? The second question is that: when a sample reaches the leaf node, how can I know

How to access weighting of indiviual decision trees in xgboost?

丶灬走出姿态 提交于 2019-11-27 10:38:19
问题 I'm using xgboost for ranking with param = {'objective':'rank:pairwise', 'booster':'gbtree'} As I understand gradient boosting works by calculating the weighted sum of the learned decision trees. How can I access the weights that are assigned to each learned booster? I wanted to try to post-process the weights after training to speed up the prediction step but I don't know how to get the individual weights. When using dump_model() , the different decision trees can be seen in the created file