问题
Imagine I've got such a tree:
- One
- One one
- One two
- One two one
- One two two
- One two three
- One two three one
- One three
- One three one
- One three two
- One three three
- One four
- One five
Data wise it's quite simple too, just a child-parent relationship:
+-------------------+---------------+
| Child | Parent |
+-------------------+---------------+
| One | |
| One one | One |
| One two | One |
| One two one | One two |
| One two two | One two |
| One two three | One two |
| One two three one | One two three |
| One three | One |
| One three one | One three |
| One three two | One three |
| One three three | One three |
| One four | One |
| One five | One |
+-------------------+---------------+
Now what I'd like to do is:
- I've got a list of two items, let's say
One three threeandOne two three one - I'd like to build rest of tree parents to the root level
In a RDBMS, I'd simply write a recursive query using CTE and UNION ALL, however I cannot find whether that's possible in Spark using Dataset or DataFrame, probably due to lack of Scala/Python knowledge. Any help would be appreciated.
Output should be as follows:
- One
- One two
- One two three
- One two three one
- One three
- One three three
回答1:
You can use a Graphx-based solution to perform a recursive query (parent/child or hierarchical queries) . This is a functionality provided by many databases called Recursive Common Table Expressions (CTE) or Connect by SQL Clause
See this article for more information: https://www.qubole.com/blog/processing-hierarchical-data-using-spark-graphx-pregel-api/
来源:https://stackoverflow.com/questions/44306095/building-hierarchy-using-spark