decision-tree

Find Distance to Decision Boundary in Decision Trees

扶醉桌前 提交于 2020-05-08 16:07:56
问题 I want to find the distance of samples to the decision boundary of a trained decision trees classifier in scikit-learn. The features are all numeric and the feature space could be of any size. I have this visualization so far for an example 2D case based on here: import numpy as np import matplotlib.pyplot as plt from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import make_moons # Generate some example data X, y = make_moons(noise=0.3, random_state=0) # Train the

Fit a Decision Tree classifier to the data; Error in code

我的梦境 提交于 2020-04-22 03:05:49
问题 This is the code I entered into RStudio to create a decision tree, and park is a data frame I have in my environment people <- park %>% select(Subj, Parkinson, fhi, jitter, rap, shimmer, apq, nhr) %>% na.omit() glimpse(people) tally(~ Parkinson, data = people, format = "percent") # simple table ################ set.seed(1688) ############# # Tree with rpart whoHasPark <- rpart(Parkinson ~ Subj, fhi, jitter, data = people, control = rpart.control(cp = 0.005, minbucket = 30)) whoHasPark plot(as

Fit a Decision Tree classifier to the data; Error in code

可紊 提交于 2020-04-22 03:04:19
问题 This is the code I entered into RStudio to create a decision tree, and park is a data frame I have in my environment people <- park %>% select(Subj, Parkinson, fhi, jitter, rap, shimmer, apq, nhr) %>% na.omit() glimpse(people) tally(~ Parkinson, data = people, format = "percent") # simple table ################ set.seed(1688) ############# # Tree with rpart whoHasPark <- rpart(Parkinson ~ Subj, fhi, jitter, data = people, control = rpart.control(cp = 0.005, minbucket = 30)) whoHasPark plot(as

Decision trees. Choosing thresholds to split objects

只愿长相守 提交于 2020-03-21 05:15:10
问题 If I understand this correctly, a set of objects (which are arrays of features) is presented and we need to split it into 2 subsets. To do that we compare some feature x j to a threshold t m (t m is the threshold at m node). We use an impurity function H() to find the best way to split the objects. But how do we choose the values of t m and which feature should be compared to the thresholds? I mean, there is an infinite number of ways we can choose t m so we can't just compute H() function

Why doesn't my character sheet work with input() when trying to choose a race in a text based adventure? python3.x

妖精的绣舞 提交于 2020-02-06 06:23:45
问题 So this is just the beginning of a long line of questions I know that I am going to have. In this text based adventure I would like to eventually have puzzles and multiple branching paths, factions you can eventually join, choice dialogue that affects the morality of situations(like mass effect or kotor but.. text based-ish), etc., etc., but I feel like the early set up is VERY important for this learning journey. I also would like to eventually convert it over to PYQT5 and maybe eventually

Can't display graphviz tree in Jupyter Notebook

本秂侑毒 提交于 2020-02-04 04:58:26
问题 I'm trying to display a decision tree in Jupyter Notebook and I keep receiving the message: CalledProcessError: Command '['dot.bat', '-Tsvg']' returned non-zero exit status 1 I'm using the following code: from sklearn.datasets import load_iris from sklearn import tree import graphviz from IPython.display import SVG iris = load_iris() clf = tree.DecisionTreeClassifier() fitted_clf = clf.fit(iris.data, iris.target) graph = graphviz.Source(tree.export_graphviz(fitted_clf, feature_names = iris

Python - Graphviz - Remove legend on nodes of DecisionTreeClassifier

北慕城南 提交于 2020-02-02 12:09:16
问题 I have a decision tree classifier from sklearn and I use pydotplus to show it. However I don't really like when there is a lot of informations on each nodes for my presentation (entropy, samples and value). To explain it easier to people I would like to only keep the decision and the class on it. Where can I modify the code to do it ? Thank you. 回答1: Accoring to the documentation, it is not possible to abstain from setting the additional information inside boxes. The only thing that you may

R update ctree (package party) features factors levels

本秂侑毒 提交于 2020-01-25 18:26:25
问题 I am trying to make sure that all my features of type factors are represented fully (in terms of all possible factor levels) both in my tree object and in my test set for prediction. for (j in 1:length(predictors)){ if (is.factor(Test[,j])){ ct [[names(predictors)[j]]] <- union(ct$xlevels[[names(predictors)[j]]], levels(Test[,c(names(predictors)[j])])) } } however, for object ct (ctree from package party) I can't seem to understand how to access the features' factor levels, as I am getting an

R update ctree (package party) features factors levels

孤人 提交于 2020-01-25 18:25:11
问题 I am trying to make sure that all my features of type factors are represented fully (in terms of all possible factor levels) both in my tree object and in my test set for prediction. for (j in 1:length(predictors)){ if (is.factor(Test[,j])){ ct [[names(predictors)[j]]] <- union(ct$xlevels[[names(predictors)[j]]], levels(Test[,c(names(predictors)[j])])) } } however, for object ct (ctree from package party) I can't seem to understand how to access the features' factor levels, as I am getting an

mapping scikit-learn DecisionTreeClassifier.tree_.value to predicted class

拥有回忆 提交于 2020-01-24 03:59:04
问题 I am using a scikit-learn DecissionTreeClassifier on a 3 class dataset. After I fit the classifier I access all leaf nodes on the tree_ attribute in order to get the amount of instances that end up in a given node for each class. clf = tree.DecisionTreeClassifier(max_depth=5) clf.fit(X, y) # lets assume there is a leaf node with id 5 print clf.tree_.value[5] This will print out: >>> array([[ 0., 1., 68.]]) but ... how do I know which position in that array belongs to which class ? The