How do I sort a linked list in sql?

后端 未结 4 747
遇见更好的自我
遇见更好的自我 2020-12-08 07:45

I have implemented a linked list as a self-referencing database table:

CREATE TABLE LinkedList(
    Id bigint NOT NULL,
    ParentId bigint NULL,
    SomeDat         


        
相关标签:
4条回答
  • 2020-12-08 08:27

    In Oracle:

    SELECT Id, ParentId, SomeData
    FROM (
      SELECT ll.*, level AS lvl
      FROM LinkedList ll
      START WITH
        ParentID IS NULL
      CONNECT BY
        ParentId = PRIOR Id
    )
    ORDER BY
      lvl
    

    P. S. It's a bad practice to use NULL as ParentID, as it is not searchable by indices. Insert a surrogate root with id of 0 or -1 instead, and use START WITH ParentID = 0.

    0 讨论(0)
  • 2020-12-08 08:35

    I found a solution for SQLServer, but looks big and much less elegant than Quassnoi's

    WITH SortedList (Id, ParentId, SomeData, Level)
    AS
    (
      SELECT Id, ParentId, SomeData, 0 as Level
        FROM LinkedList
       WHERE ParentId IS NULL
      UNION ALL
      SELECT ll.Id, ll.ParentId, ll.SomeData, Level+1 as Level
        FROM LinkedList ll
       INNER JOIN SortedList as s
          ON ll.ParentId = s.Id
    )
    
    SELECT Id, ParentId, SomeData
      FROM SortedList
     ORDER BY Level
    
    0 讨论(0)
  • 2020-12-08 08:38

    (edit: d'oh! While I was debugging you found it too!)

    In SQL Server:

    ;WITH cte (Id, ParentId, SomeData, [Level]) AS (
        SELECT Id, ParentId, SomeData, 0
        FROM LinkedList
        WHERE ParentId IS NULL
        UNION ALL
        SELECT ll.Id, ll.ParentId, ll.SomeData, cte.[Level] + 1
        FROM LinkedList ll
        INNER JOIN cte ON ll.ParentID = cte.ID
    )
    SELECT * FROM cte
    ORDER BY [Level]
    
    0 讨论(0)
  • 2020-12-08 08:45

    PostgreSQL version.

    Create table, indexes and data:

    DROP TABLE IF EXISTS LinkedList;
    
    CREATE TABLE LinkedList (
        Id BIGINT NOT NULL,
        ParentId BIGINT NULL,
        SomeData VARCHAR(50)
    );
    
    CREATE INDEX LinkedList_Id_idx on LinkedList (Id);
    CREATE index LinkedList_ParentId_idx on LinkedList (ParentId);
    
    INSERT INTO LinkedList
        (Id, ParentId, SomeData)
    VALUES 
        (24971,   NULL,      0),
        (38324,   24971,     1),
        (60088,   60089,     3),
        (60089,   38324,     2),
        (61039,   61497,     5),
        (61497,   60088,     4),
        (109397,  109831,    7),
        (109831,  61039,     6);
    

    Actual query:

    WITH RECURSIVE SortedList AS (
        SELECT
            *,
            0 AS SortKey
        FROM LinkedList
        WHERE ParentId IS NULL
        UNION ALL (
            SELECT
                LinkedList.*,
                SortedList.SortKey + 1 AS SortKey
            FROM LinkedList
            INNER JOIN SortedList
                ON (LinkedList.ParentId = SortedList.Id)
        )
    )
    SELECT
        *
    FROM SortedList
    ORDER BY SortKey;
    

    Results:

       id   | parentid | somedata | sortkey
    --------+----------+----------+---------
      24971 |          | 0        |       0
      38324 |    24971 | 1        |       1
      60089 |    38324 | 2        |       2
      60088 |    60089 | 3        |       3
      61497 |    60088 | 4        |       4
      61039 |    61497 | 5        |       5
     109831 |    61039 | 6        |       6
     109397 |   109831 | 7        |       7
    

    Also did some benchmarks:

    \set N 10000
    
    DELETE FROM LinkedList;
    INSERT INTO LinkedList VALUES (1, NULL, 1);
    INSERT INTO LinkedList (
        SELECT
            generate_series AS Id,
            (generate_series - 1) AS ParentId,
            generate_series AS SomeData
        FROM GENERATE_SERIES(2, :N)
    );
    
    EXPLAIN ANALYZE
    WITH RECURSIVE SortedList AS (
        SELECT
            *,
            0 AS SortKey
        FROM LinkedList
        WHERE ParentId IS NULL
        UNION ALL (
            SELECT
                LinkedList.*,
                SortedList.SortKey + 1 AS SortKey
            FROM LinkedList
            INNER JOIN SortedList
                ON (LinkedList.ParentId = SortedList.Id)
        )
    )
    SELECT
        *
    FROM SortedList
    ORDER BY SortKey;
    

    Results:

    Sort  (cost=6236.12..6300.16 rows=25616 width=138) (actual time=17857.640..17858.207 rows=10000 loops=1)
      Sort Key: sortedlist.sortkey
      Sort Method: quicksort  Memory: 1166kB
      CTE sortedlist
        ->  Recursive Union  (cost=4.40..2007.10 rows=25616 width=138) (actual time=0.032..17844.139 rows=10000 loops=1)
              ->  Bitmap Heap Scan on linkedlist  (cost=4.40..42.78 rows=16 width=138) (actual time=0.031..0.032 rows=1 loops=1)
                    Recheck Cond: (parentid IS NULL)
                    Heap Blocks: exact=1
                    ->  Bitmap Index Scan on linkedlist_parentid_idx  (cost=0.00..4.40 rows=16 width=0) (actual time=0.006..0.006 rows=2 loops=1)
                          Index Cond: (parentid IS NULL)
              ->  Hash Join  (cost=5.20..145.20 rows=2560 width=138) (actual time=0.896..1.780 rows=1 loops=10000)
                    Hash Cond: (linkedlist_1.parentid = sortedlist_1.id)
                    ->  Seq Scan on linkedlist linkedlist_1  (cost=0.00..96.00 rows=3200 width=134) (actual time=0.002..0.784 rows=10000 loops=10000)
                    ->  Hash  (cost=3.20..3.20 rows=160 width=12) (actual time=0.001..0.001 rows=1 loops=10000)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  WorkTable Scan on sortedlist sortedlist_1  (cost=0.00..3.20 rows=160 width=12) (actual time=0.000..0.001 rows=1 loops=10000)
      ->  CTE Scan on sortedlist  (cost=0.00..512.32 rows=25616 width=138) (actual time=0.034..17851.344 rows=10000 loops=1)
    Planning Time: 0.163 ms
    Execution Time: 17858.957 ms
    

    So this query is pretty slow.

    0 讨论(0)
提交回复
热议问题