How to label “transitive groups” with SQL?

前端未结

关注

 2  1877

悲哀的现实 2020-12-06 03:09

I have a table with ID pairs that are in a transitive relation t, that is, if \"A t B\" AND \"B t C\" then \"A t C\". Sample:

2条回答

無奈伤痛 (楼主)

2020-12-06 03:37
You can do this in Postgres; you cannot do this in all databases. Here is the query:
```
with 
    recursive cte(id1, id2) as (
     select id1, id2, 1 as level
     from t
     union all
     select t.id1, cte.id2, cte.level + 1
     from t join
          cte
          on t.id2 = cte.id1
  )
select id1, id2,
       dense_rank() over (order by grp) as label
from (select id1, id2,
             least(min(id2) over (partition by id1), min(id1) over (partition by id2)) as grp,
             level
      from cte
     ) t
where level = 1;
```
With the SQL Fiddle here.

You are walking through a tree structure in order to assign the label (cycles might pose problems with this particular version by the way). In Postgres, you can do this using an explicit recursive CTE. In SQL Server, you can do this with a CTE that is implicitly "recursive" (the key word is not used). In Oracle, you can do this with connect by.

The recursive CTE gets all pairs that are connected to each other. The main query then assigns the minimum value of id1 and id2 to the pair, to identify all pairs that are connected to each other. The final label is produced just by assigning a sequential value to the grp.

EDIT:

Egor makes a very good point. The above is assuming that the ids "descend" to the smaller values. The following version instead uses the highest level for each id for the grouping (which is really what is intended):
```
with 
    recursive cte(id1, id2) as (
     select id1, id2, 1 as level
     from t
     union all
     select t.id1, cte.id2, cte.level + 1
     from t join
          cte
          on t.id2 = cte.id1
    --  where not exists (select 1 from cte cte2 where cte2.id1 = t.id1 and cte2.id2 = t.id2) 
  ) 
select id1, id2,
       dense_rank() over (order by topvalue) as label
from (select id1, id2,
             first_value(id2) over (partition by id1 order by level desc) as topvalue,
             level
      from cte
     ) t
where level = 1;
```
EDIT II:

In response to Egor's second comment. This data is a little problematic with respect to the original problem. The following breaks it into two pieces:
```
with 
    recursive cte as (
     select id1, id2, id2 as last, id1||','||id2 as grp, 1 as level
     from t
     where id2 not in (select id1 from t)
     union all
     select t.id1, t.id2, cte.last, cte.grp, cte.level + 1
     from t join
          cte
          on t.id2 = cte.id1
    --  where not exists (select 1 from cte cte2 where cte2.id1 = t.id1 and cte2.id2 = t.id2) 
  ) 
select *
from cte;
```
But, it is not clear if that is what the original wanted. It would break the original into three groups that overlap, because there are three ids in the second column that are never in the first column. The question here is about commutativity.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...