Optimized SQL for tree structures

前端 未结 11 943
耶瑟儿~
耶瑟儿~ 2020-11-28 21:39

How would you get tree-structured data from a database with the best performance? For example, say you have a folder-hierarchy in a database. Where the folder-database-row h

11条回答
  •  误落风尘
    2020-11-28 22:46

    There are several common kinds of queries against a hierarchy. Most other kinds of queries are variations on these.

    1. From a parent, find all children.

      a. To a specific depth. For example, given my immediate parent, all children to a depth of 1 will be my siblings.

      b. To the bottom of the tree.

    2. From a child, find all parents.

      a. To a specific depth. For example, my immediate parent is parents to a depth of 1.

      b. To an unlimited depth.

    The (a) cases (a specific depth) are easier in SQL. The special case (depth=1) is trivial in SQL. The non-zero depth is harder. A finite, but non-zero depth, can be done via a finite number of joins. The (b) cases, with indefinite depth (to the top, to the bottom), are really hard.

    If you tree is HUGE (millions of nodes) then you're in a world of hurt no matter what you try to do.

    If your tree is under a million nodes, just fetch it all into memory and work on it there. Life is much simpler in an OO world. Simply fetch the rows and build the tree as the rows are returned.

    If you have a Huge tree, you have two choices.

    • Recursive cursors to handle the unlimited fetching. This means the maintenance of the structure is O(1) -- just update a few nodes and you're done. However fetching is O(n*log(n)) because you have to open a cursor for each node with children.

    • Clever "heap numbering" algorithms can encode the parentage of each node. Once each node is properly numbered, a trivial SQL SELECT can be used for all four types of queries. Changes to the tree structure, however, require renumbering the nodes, making the cost of a change fairly high compared to the cost of retrieval.

提交回复
热议问题