decision-tree | 易学教程

how extraction decision rules of random forest in python

阅读更多关于 how extraction decision rules of random forest in python

问题 I have one question though. I heard from someone that in R, you can use extra packages to extract the decision rules implemented in RF, I try to google the same thing in python but without luck, if there is any help on how to achieve that. thanks in advance! 回答1: Assuming that you use sklearn RandomForestClassifier you can find the invididual decision trees as .estimators_ . Each tree stores the decision nodes as a number of NumPy arrays under tree_ . Here is some example code which just

C5.0 decision tree - c50 code called exit with value 1

阅读更多关于 C5.0 decision tree - c50 code called exit with value 1

问题 I am getting the following error c50 code called exit with value 1 I am doing this on the titanic data available from Kaggle # Importing datasets train <- read.csv("train.csv", sep=",") # this is the structure str(train) Output :- 'data.frame': 891 obs. of 12 variables: $ PassengerId: int 1 2 3 4 5 6 7 8 9 10 ... $ Survived : int 0 1 1 1 0 0 0 0 1 1 ... $ Pclass : int 3 1 3 1 3 3 1 3 3 2 ... $ Name : Factor w/ 891 levels "Abbing, Mr. Anthony",..: 109 191 358 277 16 559 520 629 417 581 ... $

How do I visualise / plot a decision tree in Apache Spark (PySpark 1.4.1)?

阅读更多关于 How do I visualise / plot a decision tree in Apache Spark (PySpark 1.4.1)?

问题 I am using Apache Spark Mllib 1.4.1 (PySpark, the python implementation of Spark) to generate a decision tree based on LabeledPoint data I have. The tree generates correctly and I can print it to the terminal (extract the rules as this user calls it How to extract rules from decision tree spark MLlib) using: model = DecisionTree.trainClassifier( ... ) print(model.toDebugString() But what I want to do is visualize or plot the decision tree rather than printing it to the terminal. Is there any

Decision Tree in Matlab

阅读更多关于 Decision Tree in Matlab

问题 I saw the help in Matlab, but they have provided an example without explaining how to use the parameters in the 'classregtree' function. Any help to explain the use of 'classregtree' with its parameters will be appreciated. 回答1: The documentation page of the function classregtree is self-explanatory... Lets go over some of the most common parameters of the classification tree model: x : data matrix, rows are instances, cols are predicting attributes y : column vector, class label for each

Visualizing decision tree in scikit-learn

阅读更多关于 Visualizing decision tree in scikit-learn

问题 I am trying to design a simple Decision Tree using scikit-learn in Python (I am using Anaconda's Ipython Notebook with Python 2.7.3 on Windows OS) and visualize it as follows: from pandas import read_csv, DataFrame from sklearn import tree from os import system data = read_csv('D:/training.csv') Y = data.Y X = data.ix[:,"X0":"X33"] dtree = tree.DecisionTreeClassifier(criterion = "entropy") dtree = dtree.fit(X, Y) dotfile = open("D:/dtree2.dot", 'w') dotfile = tree.export_graphviz(dtree, out

Why is Random Forest with a single tree much better than a Decision Tree classifier?

阅读更多关于 Why is Random Forest with a single tree much better than a Decision Tree classifier?

问题 I learn the machine learning with the scikit-learn library. I apply the decision tree classifier and the random forest classifier to my data with this code: def decision_tree(train_X, train_Y, test_X, test_Y): clf = tree.DecisionTreeClassifier() clf.fit(train_X, train_Y) return clf.score(test_X, test_Y) def random_forest(train_X, train_Y, test_X, test_Y): clf = RandomForestClassifier(n_estimators=1) clf = clf.fit(X, Y) return clf.score(test_X, test_Y) Why the result are so much better for the

What does `sample_weight` do to the way a `DecisionTreeClassifier` works in sklearn?

阅读更多关于 What does `sample_weight` do to the way a `DecisionTreeClassifier` works in sklearn?

问题 I've read from this documentation that : "Class balancing can be done by sampling an equal number of samples from each class, or preferably by normalizing the sum of the sample weights (sample_weight) for each class to the same value." But, it is still unclear to me how this works. If I set sample_weight with an array of only two possible values, 1 's and 2 's, does this mean that the samples with 2 's will get sampled twice as often as the samples with 1 's when doing the bagging? I cannot

Passing categorical data to Sklearn Decision Tree

阅读更多关于 Passing categorical data to Sklearn Decision Tree

问题 There are several posts about how to encode categorical data to Sklearn Decission trees, but from Sklearn documentation, we got these Some advantages of decision trees are: (...) Able to handle both numerical and categorical data. Other techniques are usually specialised in analysing datasets that have only one type of variable. See algorithms for more information. But running the following script import pandas as pd from sklearn.tree import DecisionTreeClassifier data = pd.DataFrame() data[

How to extract the decision rules from scikit-learn decision-tree?

阅读更多关于 How to extract the decision rules from scikit-learn decision-tree?

问题 Can I extract the underlying decision-rules (or \'decision paths\') from a trained tree in a decision tree as a textual list? Something like: if A>0.4 then if B<0.2 then if C>0.8 then class=\'X\' Thanks for your help. 回答1: I believe that this answer is more correct than the other answers here: from sklearn.tree import _tree def tree_to_code(tree, feature_names): tree_ = tree.tree_ feature_name = [ feature_names[i] if i != _tree.TREE_UNDEFINED else "undefined!" for i in tree_.feature ] print