Finding the spanning forest (WITH RECURSIVE, PostgreSQL 9.5)

半腔热情 提交于 2021-01-27 10:46:51

问题


I have a table of identities (i.e. aliases) for an arbitrary number of people. Each row has a previous name and a new name. In production, there are about 1 M rows. For example:

id, old, new
---
1, 'Albert', 'Bob'
2, 'Bob', 'Charles'
3, 'Mary', 'Nancy'
4, 'Charles', 'Albert'
5, 'Lydia', 'Nancy'
6, 'Zoe', 'Zoe'

What I want is to generate the list of users and reference all of their respective identities. This is analogous to finding all of the nodes in each graph of connected identities, or finding the spanning forest:

User 1: Albert, Bob, Charles (identities: 1,2,4)
User 2: Mary, Nancy, Lydia (identities: 3,5)
User 3: Zoe (identities: 6)

I've been tinkering with PostgreSQL's WITH RECURSIVE, but it yields both sets and subsets for each. For example:

1,2,4 <-- spanning tree: good
2     <-- subset: discard
3,5   <-- spanning tree: good
4     <-- subset: discard
5     <-- subset: discard
6     <-- spanning tree: good

What do I need to do to only produce the full sets of identities (i.e. the spanning tree) for each user?

SQLFiddle: http://sqlfiddle.com/#!15/9eaed/4 with my latest attempt. Here's the code:

WITH RECURSIVE search_graph AS (
   SELECT id
    , id AS min_id
    , ARRAY[id] AS path
    , ARRAY[old,new] AS emails
   FROM   identities

   UNION 

   SELECT identities.id
    , LEAST(identities.id, sg.min_id)
    , (sg.path || identities.id)
    , (sg.emails || identities.old || identities.new)

   FROM search_graph sg
   JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
   WHERE  identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph, unnest(path) p GROUP BY min_id;

And the results:

1,2,4
2
3,5
4
5
6

回答1:


I wrote an answer to a similar question a while ago: How to find all connected subgraphs of an undirected graph. In that question I used SQL Server. See that answer for detailed explanation of intermediate CTEs. I adapted that query to Postgres.

It may be written more efficiently using Postgres array feature instead of concatenating the path into a text column.

WITH RECURSIVE
CTE_Idents
AS
(
    SELECT old AS Ident
    FROM identities

    UNION

    SELECT new AS Ident
    FROM identities
)
,CTE_Pairs
AS
(
    SELECT old AS Ident1, new AS Ident2
    FROM identities
    WHERE old <> new

    UNION

    SELECT new AS Ident1, old AS Ident2
    FROM identities
    WHERE old <> new
)
,CTE_Recursive
AS
(
    SELECT
        CTE_Idents.Ident AS AnchorIdent 
        , Ident1
        , Ident2
        , ',' || Ident1 || ',' || Ident2 || ',' AS IdentPath
        , 1 AS Lvl
    FROM 
        CTE_Pairs
        INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1

    UNION ALL

    SELECT 
        CTE_Recursive.AnchorIdent 
        , CTE_Pairs.Ident1
        , CTE_Pairs.Ident2
        , CTE_Recursive.IdentPath || CTE_Pairs.Ident2 || ',' AS IdentPath
        , CTE_Recursive.Lvl + 1 AS Lvl
    FROM
        CTE_Pairs
        INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
    WHERE
        CTE_Recursive.IdentPath NOT LIKE ('%,' || CTE_Pairs.Ident2 || ',%')
)
,CTE_RecursionResult
AS
(
    SELECT AnchorIdent, Ident1, Ident2
    FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
    SELECT AnchorIdent, Ident1 AS Ident
    FROM CTE_RecursionResult

    UNION

    SELECT AnchorIdent, Ident2 AS Ident
    FROM CTE_RecursionResult
)
,CTE_Groups
AS
(
  SELECT
    CTE_Idents.Ident
    ,array_agg(COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident) 
        ORDER BY COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)) AS AllIdents
  FROM
    CTE_Idents
    LEFT JOIN CTE_CleanResult ON CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
  GROUP BY CTE_Idents.Ident
)
SELECT AllIdents
FROM CTE_Groups
GROUP BY AllIdents
;

I added one row (7,X,Y) to your sample data.

SQL Fiddle

Result

|          allidents |
|--------------------|
|   Lydia,Mary,Nancy |
| Albert,Bob,Charles |
|                X,Y |
|                Zoe |


来源:https://stackoverflow.com/questions/36685015/finding-the-spanning-forest-with-recursive-postgresql-9-5

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!