decision-tree | 易学教程

Binary decision tree model when the proportion of one of the labels is almost null

阅读更多关于 Binary decision tree model when the proportion of one of the labels is almost null

问题 I want to make a decision tree with two options to predict; "YES" or "NO". The dataset I am working with has 99% of "YES" answers and only 1% of "NO" answers. As I ran the model, the score is up to 97% of accuracy. Is it a valid model or are there any considerations to take into account when working with this kind of unbalanced proportions? I am afraid that because of the large amount of "YES" data, the model is very accurate by saying the answer to everything is "YES". The "NO"s are very

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

阅读更多关于 What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

问题 I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case? For nominal attributes I believe Weka is using the information gain criterion which

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

阅读更多关于 What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

Scikit Decision tree categorical features

阅读更多关于 Scikit Decision tree categorical features

问题 There is well-know problem in Tom's Mitchell Machine Learning book to build decision tree based on the following data, where Play ball is the target variable. The resulting tree is following I wonder whether it's possible to build this tree with scikit-learn. I found several examples where decision tree can be depicted as export_graphviz(clf) Source(export_graphviz(clf, out_file=None)) However it looks like scikit doesn't work well with categorical data, the data has to be binarized into

Force the left to right order of nodes in graphviz?

阅读更多关于 Force the left to right order of nodes in graphviz?

问题 I want to draw a decision tree chart using graphviz. The graph I want to draw looks like this: I am using the following dot language: graph a { A [shape=box; label="A"] B [shape=box; label="B"] al [shape=none; label="0"] bl [shape=none; label="1"] br [shape=none; label="0"] A -- al [label="0"]; A -- B [label="1"]; B -- bl [label="0"]; B -- br [label="1"]; } However my resulting graph looks like this: How can I force the left to right order of the nodes generated by graphviz? Furthermore, as

How to solve “The data cannot have more levels than the reference” error when using confusioMatrix?

阅读更多关于 How to solve “The data cannot have more levels than the reference” error when using confusioMatrix?

问题 I'm using R programming. I divided the data as train & test for predicting accuracy. This is my code: library("tree") credit<-read.csv("C:/Users/Administrator/Desktop/german_credit (2).csv") library("caret") set.seed(1000) intrain<-createDataPartition(y=credit$Creditability,p=0.7,list=FALSE) train<-credit[intrain, ] test<-credit[-intrain, ] treemod<-tree(Creditability~. , data=train) plot(treemod) text(treemod) cv.trees<-cv.tree(treemod,FUN=prune.tree) plot(cv.trees) prune.trees<-prune.tree

Visualizing decision tree not using graphviz/web

阅读更多关于 Visualizing decision tree not using graphviz/web

问题 Due to some restriction I cannot use graphviz , webgraphviz.com to visualize decision tree (work network is closed from the other world). Question: Is there some alternative utilite or some Python code for at least very simple visualization may be just ASCII visualization of decision tree (python/sklearn) ? I mean, I can use sklearn in particular: tree.export_graphviz( ) which produces text file with tree structure, from which one can read a tree, but doing it by "eyes" is not pleasant ... PS

Where does scikit-learn hold the decision labels of each leaf node in its tree structure?

阅读更多关于 Where does scikit-learn hold the decision labels of each leaf node in its tree structure?

问题 I have trained a random forest model using scikit-learn and now I want to save its tree structures in a text file so I can use it elsewhere. According to this link a tree object consist of a number of parallel arrays each one hold some information about different nodes of the tree (ex. left child, right child, what feature it examines,...) . However there seems to be no information about the class label corresponding to each leaf node! It's even not mentioned in the examples provided in the

Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn

阅读更多关于 Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn

问题 just reading this great paper and trying to implement this: ... We treat each individual tree as a categorical feature that takes as value the index of the leaf an instance ends up falling in. We use 1- of-K coding of this type of features. For example, consider the boosted tree model in Figure 1 with 2 subtrees, where the first subtree has 3 leafs and the second 2 leafs. If an instance ends up in leaf 2 in the first subtree and leaf 1 in second subtree, the overall input to the linear

Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn

阅读更多关于 Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn