Python, PyDot and DecisionTree

匿名 (未验证) 提交于 2019-12-03 02:52:02

问题:

I'm trying to visualize my DecisionTree, but getting the error The code is:

X = [i[1:] for i in dataset]#attribute y = [i[0] for i in dataset] clf = tree.DecisionTreeClassifier()  dot_data = StringIO() tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data) graph = pydot.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("tree.pdf") 

And the error is

Traceback (most recent call last): if data.startswith(codecs.BOM_UTF8): TypeError: startswith first arg must be str or a tuple of str, not bytes 

Can anyone explain me whats the problem? Thank you a lot!

回答1:

I had the same exact problem and just spent a couple hours trying to figure this out. I can't guarantee what I share here will work for others but it may be worth a shot.

  1. I tried installing official pydot packages but I have Python 3 and they simply did not work. After finding a note in a thread from one of the many websites I scoured through, I ended up installing this forked repository of pydot.
  2. I went to graphviz.org and installed their software on my Windows 7 machine. If you don't have Windows, look under their Download section for your system.
  3. After successful install, in Environment Variables (Control Panel\All Control Panel Items\System\Advanced system settings > click Environment Variables button > under System variables I found the variable path > click Edit... > I added ;C:\Program Files (x86)\Graphviz2.38\bin to the end in the Variable value: field.
  4. To confirm I can now use dot commands in the Command Line (Windows Command Processor), I typed dot -V which returned dot - graphviz version 2.38.0 (20140413.2041).

In the below code, keep in mind that I'm reading a dataframe from my clipboard. You might be reading it from file or whathaveyou.

In IPython Notebook:

import pandas as pd import numpy as np from sklearn import tree import pydot from IPython.display import Image from sklearn.externals.six import StringIO  df = pd.read_clipboard() X = df[df.columns[:-1]] y = df[df.columns[-1]]  dtr = tree.DecisionTreeRegressor(max_depth=3) dtr.fit(X, y)  dot_data = StringIO()   tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns)   graph = pydot.graph_from_dot_data(dot_data.getvalue())   Image(graph.create_png())  

Alternatively, if you're not using IPython, you can generate your own image from the command line as long as you have graphviz installed (step 2 above). Using my same example code above, you use this line after fitting the model:

tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns) 

then open up command prompt where the treepic.dot file is and enter this command line:

dot -T png treepic.dot -o treepic.png 

A .png file should be created with your decision tree.



回答2:

In case of using Python 3, just use pydotplus instead of pydot. It will also have a soft installation process by pip.

import pydotplus  <your code>  dot_data = StringIO() tree.export_graphviz(clf, out_file=dot_data) graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) graph.write_pdf("iris.pdf") 


回答3:

The line in question is checking to see if the stream/file is encoded as UTF-8

Instead of:

if data.startswith(codecs.BOM_UTF8): 

use:

if codecs.BOM_UTF8 in data: 

You will likely have more success...



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!