Prevent and/or detect cycles in postgres

后端 未结 4 793
刺人心
刺人心 2020-12-06 12:28

Assuming a schema like the following:

CREATE TABLE node (
  id       SERIAL PRIMARY KEY,
  name     VARCHAR,
  parentid INT REFERENCES node(id)
);

相关标签:
4条回答
  • 2020-12-06 13:10

    While the current accepted answer by @Erwin Brandstetter is ok when you process one update/insert at a time, it still can fail when considering concurrent execution.

    Assume the table content defined by

    INSERT INTO node VALUES
    (1, 'A', NULL),
    (2, 'B', 1),
    (3, 'C', NULL),
    (4, 'D', 3);
    

    and then in one transaction, execute

    -- transaction A
    UPDATE node SET parentid = 2 where id = 3;
    

    and in another

    -- transaction B
    UPDATE node SET parentid = 4 where id = 1;
    

    Both UPDATE commands will succeed, and you can afterwards commit both transactions.

    -- transaction A
    COMMIT;
    
    -- transaction B
    COMMIT;
    

    You will then have a cycle 1->4->3->2->1 in the table. To make it work, you will either have to use isolation level SERIALIZABLE or use explicit locking in the trigger.

    0 讨论(0)
  • 2020-12-06 13:18

    Your trigger simplified and optimized, should be considerably faster:

    CREATE OR REPLACE FUNCTION detect_cycle()
      RETURNS TRIGGER
      LANGUAGE plpgsql AS
    $func$
    BEGIN
       IF EXISTS (
          WITH RECURSIVE search_graph(parentid, path, cycle) AS ( -- relevant columns
              -- check ahead, makes 1 step less
             SELECT g.parentid, ARRAY[g.id, g.parentid], (g.id = g.parentid)
             FROM   node g
             WHERE  g.id = NEW.id  -- only test starting from new row
             
             UNION ALL
             SELECT g.parentid, sg.path || g.parentid, g.parentid = ANY(sg.path)
             FROM   search_graph sg
             JOIN   node g ON g.id = sg.parentid
             WHERE  NOT sg.cycle
             )
          SELECT FROM search_graph
          WHERE  cycle
          LIMIT  1  -- stop evaluation at first find
          )
       THEN
          RAISE EXCEPTION 'Loop detected!';
       ELSE
         RETURN NEW;
       END IF;
    END
    $func$;
    

    You don't need dynamic SQL, you don't need to count, you don't need all the columns and you don't need to test the whole table for every single row.

    CREATE TRIGGER detect_cycle_after_update
    AFTER INSERT OR UPDATE ON node
    FOR EACH ROW EXECUTE PROCEDURE detect_cycle();

    An INSERT like this has to be prohibited, too:

    INSERT INTO node (id, name,parentid) VALUES (8,'D',9), (9,'E',8);
    
    0 讨论(0)
  • 2020-12-06 13:25
    CREATE OR REPLACE FUNCTION detect_cycle()
      RETURNS TRIGGER AS
    $func$
    DECLARE
      cycle int[];
    BEGIN
    EXECUTE format('WITH RECURSIVE search_graph(%4$I, path, cycle) AS (
      SELECT g.%4$I, ARRAY[g.%3$I, g.%4$I], (g.%3$I = g.%4$I)
        FROM %1$I.%2$I g
       WHERE g.%3$I = $1.%3$I
      UNION ALL
      SELECT g.%4$I, sg.path || g.%4$I, g.%4$I = ANY(sg.path)
        FROM search_graph  sg
        JOIN %1$I.%2$I g ON g.%3$I = sg.%4$I
       WHERE NOT sg.cycle)
    SELECT path
      FROM search_graph
     WHERE cycle
     LIMIT 1', TG_TABLE_SCHEMA, TG_TABLE_NAME, quote_ident(TG_ARGV[0]), quote_ident(TG_ARGV[1]))
    INTO cycle
    USING NEW;
    IF cycle IS NULL
    THEN
      RETURN NEW;
    ELSE
       RAISE EXCEPTION 'Loop in %.% detected: %', TG_TABLE_SCHEMA, TG_TABLE_NAME, array_to_string(cycle, ' -> ');
    END IF;
    
    END
    $func$ LANGUAGE plpgsql;
    
    CREATE TRIGGER detect_cycle_after_update
     AFTER INSERT OR UPDATE ON node
       FOR EACH ROW EXECUTE PROCEDURE detect_cycle('id', 'parent_id');
    
    0 讨论(0)
  • 2020-12-06 13:29

    To answer my own question, I came up with a trigger that prevents this:

    CREATE OR REPLACE FUNCTION detect_cycle() RETURNS TRIGGER AS
    $func$
    DECLARE
      loops INTEGER;
    BEGIN
       EXECUTE 'WITH RECURSIVE search_graph(id, parentid, name, depth, path, cycle) AS (
            SELECT g.id, g.parentid, g.name, 1,
              ARRAY[g.id],
              false
            FROM node g
          UNION ALL
            SELECT g.id, g.parentid, g.name, sg.depth + 1,
              path || g.id,
              g.id = ANY(path)
            FROM node g, search_graph sg
            WHERE g.id = sg.parentid AND NOT cycle
    )
    SELECT count(*) FROM search_graph where cycle = TRUE' INTO loops;
    IF loops > 0 THEN
      RAISE EXCEPTION 'Loop detected!';
    ELSE
      RETURN NEW;
    END IF;
    END
    $func$ LANGUAGE plpgsql;
    
    CREATE TRIGGER detect_cycle_after_update
    AFTER UPDATE ON node
    FOR EACH ROW EXECUTE PROCEDURE detect_cycle();
    

    So, if you try to create a loop, like in the question:

    UPDATE node SET parentid = 2 WHERE id = 1;
    

    You get an EXCEPTION:

    ERROR:  Loop detected!
    
    0 讨论(0)
提交回复
热议问题