How to repair a corrupted MPTT tree (nested set) in the database using SQL?

前端 未结 4 601
我在风中等你
我在风中等你 2020-12-23 22:12

I have an MPTT tree of over 100,000 records stored in MySQL using lft, rght and parent_id columns. Now the left/right values became co

4条回答
  •  一向
    一向 (楼主)
    2020-12-23 22:52

    Here's what I have adapted from @Lieven's answer, incorporating feedback from here for better performance:

    DROP PROCEDURE IF EXISTS tree_recover;
    
    DELIMITER //
    
    CREATE PROCEDURE tree_recover ()
    MODIFIES SQL DATA
    BEGIN
    
        DECLARE currentId, currentParentId  CHAR(36);
        DECLARE currentLeft                 INT;
        DECLARE startId                     INT DEFAULT 1;
    
        # Determines the max size for MEMORY tables.
        SET max_heap_table_size = 1024 * 1024 * 512;
    
        START TRANSACTION;
    
        # Temporary MEMORY table to do all the heavy lifting in,
        # otherwise performance is simply abysmal.
        CREATE TABLE `tmp_tree` (
            `id`        char(36) NOT NULL DEFAULT '',
            `parent_id` char(36)          DEFAULT NULL,
            `lft`       int(11)  unsigned DEFAULT NULL,
            `rght`      int(11)  unsigned DEFAULT NULL,
            PRIMARY KEY      (`id`),
            INDEX USING HASH (`parent_id`),
            INDEX USING HASH (`lft`),
            INDEX USING HASH (`rght`)
        ) ENGINE = MEMORY
        SELECT `id`,
               `parent_id`,
               `lft`,
               `rght`
        FROM   `tree`;
    
        # Leveling the playing field.
        UPDATE  `tmp_tree`
        SET     `lft`  = NULL,
                `rght` = NULL;
    
        # Establishing starting numbers for all root elements.
        WHILE EXISTS (SELECT * FROM `tmp_tree` WHERE `parent_id` IS NULL AND `lft` IS NULL AND `rght` IS NULL LIMIT 1) DO
    
            UPDATE `tmp_tree`
            SET    `lft`  = startId,
                   `rght` = startId + 1
            WHERE  `parent_id` IS NULL
              AND  `lft`       IS NULL
              AND  `rght`      IS NULL
            LIMIT  1;
    
            SET startId = startId + 2;
    
        END WHILE;
    
        # Switching the indexes for the lft/rght columns to B-Trees to speed up the next section, which uses range queries.
        DROP INDEX `lft`  ON `tmp_tree`;
        DROP INDEX `rght` ON `tmp_tree`;
        CREATE INDEX `lft`  USING BTREE ON `tmp_tree` (`lft`);
        CREATE INDEX `rght` USING BTREE ON `tmp_tree` (`rght`);
    
        # Numbering all child elements
        WHILE EXISTS (SELECT * FROM `tmp_tree` WHERE `lft` IS NULL LIMIT 1) DO
    
            # Picking an unprocessed element which has a processed parent.
            SELECT     `tmp_tree`.`id`
              INTO     currentId
            FROM       `tmp_tree`
            INNER JOIN `tmp_tree` AS `parents`
                    ON `tmp_tree`.`parent_id` = `parents`.`id`
            WHERE      `tmp_tree`.`lft` IS NULL
              AND      `parents`.`lft`  IS NOT NULL
            LIMIT      1;
    
            # Finding the element's parent.
            SELECT  `parent_id`
              INTO  currentParentId
            FROM    `tmp_tree`
            WHERE   `id` = currentId;
    
            # Finding the parent's lft value.
            SELECT  `lft`
              INTO  currentLeft
            FROM    `tmp_tree`
            WHERE   `id` = currentParentId;
    
            # Shifting all elements to the right of the current element 2 to the right.
            UPDATE `tmp_tree`
            SET    `rght` = `rght` + 2
            WHERE  `rght` > currentLeft;
    
            UPDATE `tmp_tree`
            SET    `lft` = `lft` + 2
            WHERE  `lft` > currentLeft;
    
            # Setting lft and rght values for current element.
            UPDATE `tmp_tree`
            SET    `lft`  = currentLeft + 1,
                   `rght` = currentLeft + 2
            WHERE  `id`   = currentId;
    
        END WHILE;
    
        # Writing calculated values back to physical table.
        UPDATE `tree`, `tmp_tree`
        SET    `tree`.`lft`  = `tmp_tree`.`lft`,
               `tree`.`rght` = `tmp_tree`.`rght`
        WHERE  `tree`.`id`   = `tmp_tree`.`id`;
    
        COMMIT;
    
        DROP TABLE `tmp_tree`;
    
    END//
    
    DELIMITER ;
    

    Worked well with some test data, but it's still running on my 100,000 records tree, so I can't give any final judgement yet. The naïve script working directly on the physical table has abysmal performance, running for at least hours, more likely days. Switching to a temporary MEMORY table brought this time down to about an hour, choosing the right indexes cut it down to 10 mins.

提交回复
热议问题