decision-tree

Binary decision tree model when the proportion of one of the labels is almost null

可紊 提交于 2019-12-24 20:25:00
问题 I want to make a decision tree with two options to predict; "YES" or "NO". The dataset I am working with has 99% of "YES" answers and only 1% of "NO" answers. As I ran the model, the score is up to 97% of accuracy. Is it a valid model or are there any considerations to take into account when working with this kind of unbalanced proportions? I am afraid that because of the large amount of "YES" data, the model is very accurate by saying the answer to everything is "YES". The "NO"s are very

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

坚强是说给别人听的谎言 提交于 2019-12-24 17:38:10
问题 I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case? For nominal attributes I believe Weka is using the information gain criterion which

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

扶醉桌前 提交于 2019-12-24 17:38:05
问题 I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case? For nominal attributes I believe Weka is using the information gain criterion which

Scikit Decision tree categorical features

↘锁芯ラ 提交于 2019-12-24 11:37:53
问题 There is well-know problem in Tom's Mitchell Machine Learning book to build decision tree based on the following data, where Play ball is the target variable. The resulting tree is following I wonder whether it's possible to build this tree with scikit-learn. I found several examples where decision tree can be depicted as export_graphviz(clf) Source(export_graphviz(clf, out_file=None)) However it looks like scikit doesn't work well with categorical data, the data has to be binarized into

Force the left to right order of nodes in graphviz?

社会主义新天地 提交于 2019-12-24 03:23:52
问题 I want to draw a decision tree chart using graphviz. The graph I want to draw looks like this: I am using the following dot language: graph a { A [shape=box; label="A"] B [shape=box; label="B"] al [shape=none; label="0"] bl [shape=none; label="1"] br [shape=none; label="0"] A -- al [label="0"]; A -- B [label="1"]; B -- bl [label="0"]; B -- br [label="1"]; } However my resulting graph looks like this: How can I force the left to right order of the nodes generated by graphviz? Furthermore, as

How to solve “The data cannot have more levels than the reference” error when using confusioMatrix?

自闭症网瘾萝莉.ら 提交于 2019-12-24 00:42:04
问题 I'm using R programming. I divided the data as train & test for predicting accuracy. This is my code: library("tree") credit<-read.csv("C:/Users/Administrator/Desktop/german_credit (2).csv") library("caret") set.seed(1000) intrain<-createDataPartition(y=credit$Creditability,p=0.7,list=FALSE) train<-credit[intrain, ] test<-credit[-intrain, ] treemod<-tree(Creditability~. , data=train) plot(treemod) text(treemod) cv.trees<-cv.tree(treemod,FUN=prune.tree) plot(cv.trees) prune.trees<-prune.tree

Visualizing decision tree not using graphviz/web

痞子三分冷 提交于 2019-12-23 19:19:19
问题 Due to some restriction I cannot use graphviz , webgraphviz.com to visualize decision tree (work network is closed from the other world). Question: Is there some alternative utilite or some Python code for at least very simple visualization may be just ASCII visualization of decision tree (python/sklearn) ? I mean, I can use sklearn in particular: tree.export_graphviz( ) which produces text file with tree structure, from which one can read a tree, but doing it by "eyes" is not pleasant ... PS

Where does scikit-learn hold the decision labels of each leaf node in its tree structure?

夙愿已清 提交于 2019-12-23 07:26:27
问题 I have trained a random forest model using scikit-learn and now I want to save its tree structures in a text file so I can use it elsewhere. According to this link a tree object consist of a number of parallel arrays each one hold some information about different nodes of the tree (ex. left child, right child, what feature it examines,...) . However there seems to be no information about the class label corresponding to each leaf node! It's even not mentioned in the examples provided in the

Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn

时光怂恿深爱的人放手 提交于 2019-12-23 06:01:51
问题 just reading this great paper and trying to implement this: ... We treat each individual tree as a categorical feature that takes as value the index of the leaf an instance ends up falling in. We use 1- of-K coding of this type of features. For example, consider the boosted tree model in Figure 1 with 2 subtrees, where the first subtree has 3 leafs and the second 2 leafs. If an instance ends up in leaf 2 in the first subtree and leaf 1 in second subtree, the overall input to the linear

Can you get the selected leaf from a DecisionTreeRegressor in scikit-learn

吃可爱长大的小学妹 提交于 2019-12-23 06:01:08
问题 just reading this great paper and trying to implement this: ... We treat each individual tree as a categorical feature that takes as value the index of the leaf an instance ends up falling in. We use 1- of-K coding of this type of features. For example, consider the boosted tree model in Figure 1 with 2 subtrees, where the first subtree has 3 leafs and the second 2 leafs. If an instance ends up in leaf 2 in the first subtree and leaf 1 in second subtree, the overall input to the linear