How to label “transitive groups” with SQL?

前端 未结 2 1857
悲哀的现实
悲哀的现实 2020-12-06 03:09

I have a table with ID pairs that are in a transitive relation t, that is, if \"A t B\" AND \"B t C\" then \"A t C\". Sample:

<         


        
2条回答
  •  無奈伤痛
    2020-12-06 03:37

    You can do this in Postgres; you cannot do this in all databases. Here is the query:

    with 
        recursive cte(id1, id2) as (
         select id1, id2, 1 as level
         from t
         union all
         select t.id1, cte.id2, cte.level + 1
         from t join
              cte
              on t.id2 = cte.id1
      )
    select id1, id2,
           dense_rank() over (order by grp) as label
    from (select id1, id2,
                 least(min(id2) over (partition by id1), min(id1) over (partition by id2)) as grp,
                 level
          from cte
         ) t
    where level = 1;
    

    With the SQL Fiddle here.

    You are walking through a tree structure in order to assign the label (cycles might pose problems with this particular version by the way). In Postgres, you can do this using an explicit recursive CTE. In SQL Server, you can do this with a CTE that is implicitly "recursive" (the key word is not used). In Oracle, you can do this with connect by.

    The recursive CTE gets all pairs that are connected to each other. The main query then assigns the minimum value of id1 and id2 to the pair, to identify all pairs that are connected to each other. The final label is produced just by assigning a sequential value to the grp.

    EDIT:

    Egor makes a very good point. The above is assuming that the ids "descend" to the smaller values. The following version instead uses the highest level for each id for the grouping (which is really what is intended):

    with 
        recursive cte(id1, id2) as (
         select id1, id2, 1 as level
         from t
         union all
         select t.id1, cte.id2, cte.level + 1
         from t join
              cte
              on t.id2 = cte.id1
        --  where not exists (select 1 from cte cte2 where cte2.id1 = t.id1 and cte2.id2 = t.id2) 
      ) 
    select id1, id2,
           dense_rank() over (order by topvalue) as label
    from (select id1, id2,
                 first_value(id2) over (partition by id1 order by level desc) as topvalue,
                 level
          from cte
         ) t
    where level = 1;
    

    EDIT II:

    In response to Egor's second comment. This data is a little problematic with respect to the original problem. The following breaks it into two pieces:

    with 
        recursive cte as (
         select id1, id2, id2 as last, id1||','||id2 as grp, 1 as level
         from t
         where id2 not in (select id1 from t)
         union all
         select t.id1, t.id2, cte.last, cte.grp, cte.level + 1
         from t join
              cte
              on t.id2 = cte.id1
        --  where not exists (select 1 from cte cte2 where cte2.id1 = t.id1 and cte2.id2 = t.id2) 
      ) 
    select *
    from cte;
    

    But, it is not clear if that is what the original wanted. It would break the original into three groups that overlap, because there are three ids in the second column that are never in the first column. The question here is about commutativity.

提交回复
热议问题