Python - Graphviz - Remove legend on nodes of DecisionTreeClassifier

问题

I have a decision tree classifier from sklearn and I use pydotplus to show it. However I don't really like when there is a lot of informations on each nodes for my presentation (entropy, samples and value).

To explain it easier to people I would like to only keep the decision and the class on it. Where can I modify the code to do it ?

Thank you.

回答1:

Accoring to the documentation, it is not possible to abstain from setting the additional information inside boxes. The only thing that you may implicitly omit is the impurity parameter.

However, I have done it the other explicit way which is somewhat crooked. First, I save the .dot file setting the impurity to False. Then, I open it up and convert it to a string format. I use regex to subtract the redundant labels and resave it.

The code goes like this:

import pydotplus  # pydot library: install it via pip install pydot
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import export_graphviz
from sklearn.datasets import load_iris

data = load_iris()
clf = DecisionTreeClassifier()
clf.fit(data.data, data.target)

export_graphviz(clf, out_file='tree.dot', impurity=False, class_names=True)

PATH = '/path/to/dotfile/tree.dot'
f = pydot.graph_from_dot_file(PATH).to_string()
f = re.sub('(\\\\nsamples = [0-9]+)(\\\\nvalue = \[[0-9]+, [0-9]+, [0-9]+\])', '', f)
f = re.sub('(samples = [0-9]+)(\\\\nvalue = \[[0-9]+, [0-9]+, [0-9]+\])\\\\n', '', f)

with open('tree_modified.dot', 'w') as file:
    file.write(f)

Here are the images before and after modification:

In your case, there seems to be more parameters in boxes, so you may want to tweak the code a little bit.

I hope that helps!

来源：https://stackoverflow.com/questions/44821349/python-graphviz-remove-legend-on-nodes-of-decisiontreeclassifier

标签

python

scikit-learn

decision-tree