To find infinite recursive loop in CTE

后端 未结 6 378
梦如初夏
梦如初夏 2020-12-16 00:55

I\'m not a SQL expert, but if anybody can help me.

I use a recursive CTE to get the values as below.

Child1 --> Parent 1

Parent1 --> Parent 2

相关标签:
6条回答
  • 2020-12-16 01:24

    Here's an alternate method for detecting cycles in adjacency lists (parent/child relationships) where nodes can only have one parent which can be enforced with a unique constraint on the child column (id in the table below). This works by computing the closure table for the adjacency list via a recursive query. It starts by adding every node to the closure table as its own ancestor at level 0 then iteratively walks the adjacency list to expand the closure table. Cycles are detected when a new record's child and ancestor are the same at any level other than the original level zero (0):

    -- For PostgreSQL and MySQL 8 use the Recursive key word in the CTE code:
    -- with RECURSIVE cte(ancestor, child, lev, cycle) as (
    
    with cte(ancestor, child, lev, cycle) as (
      select id, id, 0, 0 from Table1
      union all
      select cte.ancestor
           , Table1.id
           , case when cte.ancestor = Table1.id then 0 else cte.lev + 1 end
           , case when cte.ancestor = Table1.id then cte.lev + 1 else 0 end
        from Table1
        join cte
          on cte.child = Table1.PARENT_ID
       where cte.cycle = 0
    ) -- In oracle uncomment the next line
    -- cycle child set isCycle to 'Y' default 'N'
    select distinct
           ancestor
         , child
         , lev
         , max(cycle) over (partition by ancestor) cycle
      from cte
    

    Given the following adjacency list for Table1:

    | parent_id | id |
    |-----------|----|
    |    (null) |  1 |
    |    (null) |  2 |
    |         1 |  3 |
    |         3 |  4 |
    |         1 |  5 |
    |         2 |  6 |
    |         6 |  7 |
    |         7 |  8 |
    |         9 | 10 |
    |        10 | 11 |
    |        11 |  9 |
    

    The above query which works on SQL Sever (and Oracle, PostgreSQL and MySQL 8 when modified as directed) rightly detects that nodes 9, 10, and 11 participate in a cycle of length 3.

    SQL(/DB) Fiddles demonstrating this in various DBs can be found below:

    • Oracle 11gR2
    • SQL Server 2017
    • PostgeSQL 9.5
    • MySQL 8
    0 讨论(0)
  • 2020-12-16 01:27

    With Postgres it's quite easy to prevent this by collecting all visited nodes in an array.

    Setup:

    create table hierarchy (id integer, parent_id integer);
    
    insert into hierarchy
    values
    (1, null), -- root element
    (2, 1), -- first child
    (3, 1), -- second child
    (4, 3), 
    (5, 4), 
    (3, 5); -- endless loop
    

    Recursive query:

    with recursive tree as (
      select id, 
             parent_id, 
             array[id] as all_parents
      from hierarchy
      where parent_id is null
    
      union all
    
      select c.id, 
             c.parent_id,
             p.all_parents||c.id
      from hierarchy c
         join tree p
          on c.parent_id = p.id 
         and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
    )
    select *
    from tree;
    

    To do this for multiple trees at the same time, you need to carry over the ID of the root node to the children:

    with recursive tree as (
      select id, 
             parent_id, 
             array[id] as all_parents, 
             id as root_id
      from hierarchy
      where parent_id is null
    
      union all
    
      select c.id, 
             c.parent_id,
             p.all_parents||c.id, 
             p.root_id
      from hierarchy c
         join tree p
          on c.parent_id = p.id 
         and c.id <> ALL (p.all_parents) -- this is the trick to exclude the endless loops
         and c.root_id = p.root_id
    )
    select *
    from tree;
    
    0 讨论(0)
  • 2020-12-16 01:27

    Here is the solution for SQL Server:

    Table Insert script:

    CREATE TABLE MyTable
    (
        [ID] INT,
        [ParentID] INT,
        [Name] NVARCHAR(255)
    );
    
    INSERT INTO MyTable
    (
        [ID],
        [ParentID],
        [Name]
    )
    VALUES
    (1, NULL, 'A root'),
    (2, NULL, 'Another root'),
    (3, 1, 'Child of 1'),
    (4, 3, 'Grandchild of 1'),
    (5, 4, 'Great grandchild of 1'),
    (6, 1, 'Child of 1'),
    (7, 8, 'Child of 8'),
    (8, 7, 'Child of 7'), -- This will cause infinite recursion
    (9, 1, 'Child of 1');
    

    Script to find the exact records which are the culprit:

    ;WITH RecursiveCTE
    AS (
       -- Get all parents: 
       -- Any record in MyTable table could be an Parent
       -- We don't know here yet which record can involve in an infinite recursion.
       SELECT ParentID AS StartID,
              ID,
              CAST(Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
       FROM MyTable
       UNION ALL
    
       -- Recursively try finding all the childrens of above parents
       -- Keep on finding it until this child become parent of above parent.
       -- This will bring us back in the circle to parent record which is being
       -- keep in the StartID column in recursion
       SELECT RecursiveCTE.StartID,
              t.ID,
              CAST(RecursiveCTE.[ParentChildRelationPath] + ' -> ' + t.Name AS NVARCHAR(255)) AS [ParentChildRelationPath]
       FROM RecursiveCTE
           INNER JOIN MyTable AS t
               ON t.ParentID = RecursiveCTE.ID
       WHERE RecursiveCTE.StartID != RecursiveCTE.ID)
    
    -- FInd the ones which causes the infinite recursion
    SELECT StartID,
           [ParentChildRelationPath],
           RecursiveCTE.ID
    FROM RecursiveCTE
    WHERE StartID = ID
    OPTION (MAXRECURSION 0);
    

    Output of above query:

    0 讨论(0)
  • 2020-12-16 01:36

    You haven't specified the dialect or your column names, so it is difficult to make the perfect example...

    -- Some random data
    IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
        DROP TABLE #MyTable
    
    CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
    INSERT INTO #MyTable (ID, ParentID, Description) VALUES
    (1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
    (2, 1, 'Child'), -- Try changing the second value (1) to 2 
    (3, 2, 'SubChild')
    -- End random data
    
    ;WITH RecursiveCTE (StartingID, Level, Parents, Loop, ID, ParentID, Description) AS
    (
        SELECT ID, 1, '|' + CAST(ID AS VARCHAR(MAX)) + '|', 0, * FROM #MyTable
        UNION ALL
        SELECT R.StartingID, R.Level + 1, 
            R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
            CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
            MT.*
            FROM #MyTable MT
            INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
    )
    
    SELECT StartingID, Level, Parents, MAX(Loop) OVER (PARTITION BY StartingID) Loop, ID, ParentID, Description 
        FROM RecursiveCTE 
        ORDER BY StartingID, Level
    

    Something like this will show if/where there are loops in the recursive cte. Look at the column Loop. With the data as is, there is no loops. In the comments there are examples on how to change the values to cause a loop.

    In the end the recursive cte creates a VARCHAR(MAX) of ids in the form |id1|id2|id3| (called Parents) and then checks if the current ID is already in that "list". If yes, it sets the Loop column to 1. This column is checked in the recursive join (the ABD R.Loop = 0).

    The ending query uses a MAX() OVER (PARTITION BY ...) to set to 1 the Loop column for a whole "block" of chains.

    A little more complex, that generates a "better" report:

    -- Some random data
    IF OBJECT_ID('tempdb..#MyTable') IS NOT NULL
        DROP TABLE #MyTable
    
    CREATE TABLE #MyTable (ID INT PRIMARY KEY, ParentID INT NULL, Description VARCHAR(100))
    INSERT INTO #MyTable (ID, ParentID, Description) VALUES
    (1, NULL, 'Parent'), -- Try changing the second value (NULL) to 1 or 2 or 3
    (2, 1, 'Child'), -- Try changing the second value (1) to 2 
    (3, 3, 'SubChild')
    -- End random data
    
    -- The "terminal" childrens (that are elements that don't have childrens
    -- connected to them)
    ;WITH WithoutChildren AS
    (
        SELECT MT1.* FROM #MyTable MT1
            WHERE NOT EXISTS (SELECT 1 FROM #MyTable MT2 WHERE MT1.ID != MT2.ID AND MT1.ID = MT2.ParentID)
    )
    
    , RecursiveCTE (StartingID, Level, Parents, Descriptions, Loop, ParentID) AS
    (
        SELECT ID, -- StartingID 
            1, -- Level
            '|' + CAST(ID AS VARCHAR(MAX)) + '|', 
            '|' + CAST(Description AS VARCHAR(MAX)) + '|', 
            0, -- Loop
            ParentID
            FROM WithoutChildren
        UNION ALL
        SELECT R.StartingID, -- StartingID
            R.Level + 1, -- Level
            R.Parents + CAST(MT.ID AS VARCHAR(MAX)) + '|',
            R.Descriptions + CAST(MT.Description AS VARCHAR(MAX)) + '|', 
            CASE WHEN R.Parents LIKE '%|' + CAST(MT.ID AS VARCHAR(MAX)) + '|%' THEN 1 ELSE 0 END,
            MT.ParentID
            FROM #MyTable MT
            INNER JOIN RecursiveCTE R ON R.ParentID = MT.ID AND R.Loop = 0
    )
    
    SELECT * FROM RecursiveCTE 
        WHERE ParentID IS NULL OR Loop = 1
    

    This query should return all the "last child" rows, with the full parent chain. The column Loop is 0 if there is no loop, 1 if there is a loop.

    0 讨论(0)
  • 2020-12-16 01:47

    You can use the same approach described by Knuth for detecting a cycle in a linked list here. In one column, keep track of the children, the children's children, the children's children's children, etc. In another column, keep track of the grandchildren, the grandchildren's grandchildren, the grandchildren's grandchildren's grandchildren, etc.

    For the initial selection, the distance between Child and Grandchild columns is 1. Every selection from union all increases the depth of Child by 1, and that of Grandchild by 2. The distance between them increases by 1.

    If you have any loop, since the distance only increases by 1 each time, at some point after Child is in the loop, the distance will be a multiple of the cycle length. When that happens, the Child and the Grandchild columns are the same. Use that as an additional condition to stop the recursion, and detect it in the rest of your code as an error.

    SQL Server sample:

    declare @LinkTable table (Parent int, Child int);
    insert into @LinkTable values (1, 2), (1, 3), (2, 4), (2, 5), (3, 6), (3, 7), (7, 1);
    
    with cte as (
        select lt1.Parent, lt1.Child, lt2.Child as Grandchild
        from @LinkTable lt1
        inner join @LinkTable lt2 on lt2.Parent = lt1.Child
        union all
        select cte.Parent, lt1.Child, lt3.Child as Grandchild
        from cte
        inner join @LinkTable lt1 on lt1.Parent = cte.Child
        inner join @LinkTable lt2 on lt2.Parent = cte.Grandchild
        inner join @LinkTable lt3 on lt3.Parent = lt2.Child
        where cte.Child <> cte.Grandchild
    )
    select Parent, Child
    from cte
    where Child = Grandchild;
    

    Remove one of the LinkTable records that causes the cycle, and you will find that the select no longer returns any data.

    0 讨论(0)
  • 2020-12-16 01:50

    Try to limit the recursive result

    WITH EMP_CTE AS
    ( 
    
        SELECT 
            0 AS [LEVEL],   
            ManagerId, EmployeeId, Name
        FROM Employees
        WHERE ManagerId IS NULL
    
        UNION ALL
    
        SELECT 
            [LEVEL] + 1 AS [LEVEL],
            ManagerId, EmployeeId, Name
        FROM Employees e
        INNER JOIN EMP_CTE c ON e.ManagerId = c.EmployeeId 
     AND s.LEVEL < 100 --RECURSION LIMIT
    ) 
    
        SELECT  * FROM EMP_CTE WHERE [Level] = 100
    
    0 讨论(0)
提交回复
热议问题