How do I visualise / plot a decision tree in Apache Spark (PySpark 1.4.1)?

妖精的绣舞 提交于 2019-12-07 06:53:58

问题


I am using Apache Spark Mllib 1.4.1 (PySpark, the python implementation of Spark) to generate a decision tree based on LabeledPoint data I have. The tree generates correctly and I can print it to the terminal (extract the rules as this user calls it How to extract rules from decision tree spark MLlib) using:

model = DecisionTree.trainClassifier( ... )
print(model.toDebugString()

But what I want to do is visualize or plot the decision tree rather than printing it to the terminal. Is there any way I can plot the decision tree in PySpark or maybe I can save the decision tree data and use R to plot it? Thanks!


回答1:


There is this project Decision-Tree-Visualization-Spark for visualizing decision tree model

It has two steps

  • Parse Spark Decision Tree output to a JSON format.
  • Use the JSON file as an input to a D3.js visualization.

For the parser check Dt.py

The input to the function def tree_json(tree) is your models toDebugString()

Answer from question




回答2:


Though this is a little old post, just to provide my answer so that others coming to this post from now on can be benefitted.

Alternatively you can use "graphviz" python Package for use in PySpark. It will print the decision tree model into a neat tree structure rather than usual if loop structure.

More details can be found in this link : https://pypi.python.org/pypi/graphviz



来源:https://stackoverflow.com/questions/32232940/feedback-visualization-for-apache-spark-decision-trees

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!