Fast Relational method of storing tree data (for instance threaded comments on articles)

后端 未结 6 1112
你的背包
你的背包 2020-12-13 10:59

I have a cms which stores comments against articles. These comments can be both threaded and non threaded. Although technically they are the same just with the reply column

相关标签:
6条回答
  • 2020-12-13 11:21

    Unfortunately, the pure SQL methods to do it are quite slow.

    The NESTED SETS proposed by @Marc W are quite elegant but they may require updating the whole tree if your tree branches hit the ranges, which can be quite slow.

    See this article in my blog on how to do it fast in MySQL:

    • Hierarchical queries in MySQL - emulating Oracle's CONNECT BY

    You'll need to create a function:

    CREATE FUNCTION hierarchy_connect_by_parent_eq_prior_id(value INT) RETURNS INT
    NOT DETERMINISTIC
    READS SQL DATA
    BEGIN
            DECLARE _id INT;
            DECLARE _parent INT;
            DECLARE _next INT;
            DECLARE CONTINUE HANDLER FOR NOT FOUND SET @id = NULL;
    
            SET _parent = @id;
            SET _id = -1;
    
            IF @id IS NULL THEN
                    RETURN NULL;
            END IF;
    
            LOOP
                    SELECT  MIN(id)
                    INTO    @id
                    FROM    t_hierarchy
                    WHERE   parent = _parent
                            AND id > _id;
                    IF @id IS NOT NULL OR _parent = @start_with THEN
                            SET @level = @level + 1;
                            RETURN @id;
                    END IF;
                    SET @level := @level - 1;
                    SELECT  id, parent
                    INTO    _id, _parent
                    FROM    t_hierarchy
                    WHERE   id = _parent;
            END LOOP;
    END
    

    and use it in a query like this:

    SELECT  hi.*
    FROM    (
            SELECT  hierarchy_connect_by_parent_eq_prior_id(id) AS id, @level AS level
            FROM    (
                    SELECT  @start_with := 0,
                            @id := @start_with,
                            @level := 0
                    ) vars, t_hierarchy
            WHERE   @id IS NOT NULL
            ) ho
    JOIN    t_hierarchy hi
    ON      hi.id = ho.id
    

    This is of course MySQL specific but it's real fast.

    If you want this to be portable betwen PostgreSQL and MySQL, you can use PostgreSQL's contrib for CONNECT BY and wrap the query into a stored procedure with same name for both systems.

    0 讨论(0)
  • 2020-12-13 11:23

    You've got a choice between the adjacency and the nested set models. The article Managing Hierarchical Data in MySQL makes for a nice introduction.

    For a theoretical discussion, see Celko's Trees and Hierarchies.

    It's rather easy to implement a threaded list if your database supports windowing functions. All you need is a recursive reference in your target database table, such as:

    create Tablename (
      RecordID integer not null default 0 auto_increment,
      ParentID integer default null references RecordID,
      ...
    )
    

    You can then use a recursive Common Table Expression to display a threaded view. An example is available here.

    0 讨论(0)
  • 2020-12-13 11:29

    I really like how Drupal solves this problem. It assigns a thread id to each comment. This id starts at 1 for the first comment. If a reply is added to this comment, the id 1.1 is assigned to it. A reply to comment 1.1 is given the thread id 1.1.1. A sibling of comment 1.1 is given the thread id 1.2. You get the idea. Calculating these thread ids can be done easily with one query when a comment is added.

    When the thread is rendered, all of the comments that belong to the thread are fetched in a single query, sorted by the thread id. This gives you the threads in the ascending order. Furthermore, using the thread id, you can find the nesting level of each comment, and indent it accordingly.

    1
    1.1
    1.1.1
    1.2
    1.2.1
    

    There are a few issues to sort out:

    • If one component of the thread id grows to 2 digits, sorting by thread id will not produce the expected order. An easy solution is ensuring that all components of a thread id are padded by zeros to have the same width.
    • Sorting by descending thread id does not produce the expected descending order.

    Drupal solves the first issue in a more complicated way using a numbering system called vancode. As for the second issue, it is solved by appending a backslash (whose ASCII code is higher than digits) to thread ids when sorting by descending order. You can find more details about this implementation by checking the source code of the comments module (see the big comment before the function comment_get_thread).

    0 讨论(0)
  • 2020-12-13 11:34

    Actually, it has to be a balance between read and write.

    If you are OK with updating a bunch of rows on every insert, then nested set (or an equivalent) will give you easy, fast reads.

    Other than that, a simple FK on the parent will give you ultra-simple insert, but might well be a nightmare for retrieval.

    I think I'd go with the nested sets, but be careful about the expected data volume and usage patterns (updating several, maybe a lot of, rows on two indexed columns (for left and right info) for every insert might be a problem at some point).

    0 讨论(0)
  • 2020-12-13 11:41

    I know the answer is a bit late, but for tree data use a closure table http://www.slideshare.net/billkarwin/models-for-hierarchical-data

    It describes 4 methods:

    • Adjcency list (the simple parent foreign key)
    • Path enumeration (the Drupal strategy mentioned in the accepted answer)
    • Nested sets
    • Closure table (storing ancestor/descendant facts in a separate relation [table], with a possible distance column)

    The last option has advantages of easy CRUD operations compared to the rest. The cost is space, which is O(n^2) size in the number tree nodes in the worst case, but probably not so bad in practice.

    0 讨论(0)
  • 2020-12-13 11:47

    I just did this myself, actually! I used the nested set model of representing hierarchical data in a relational database.

    Managing Hierarchical Data in MySQL was pure gold for me. Nested sets are the second model described in that article.

    0 讨论(0)
提交回复
热议问题