Group rows by ID and find max/min(date_from, date_to) with date gaps

こ雲淡風輕ζ 提交于 2021-01-28 11:55:39

问题


I need to group data by id and find max/min (date_from, date_to). But if there is date gap it should be new row.

I have following data:

SYS_ID  ITEM_ID DATE_FROM   DATE_TO
1       1       01.01.2019  20.01.2019
1       1       15.01.2019  10.02.2019
1       1       15.02.2019  20.02.2019
1       1       18.02.2019  10.03.2019
1       1       10.03.2019  22.03.2019
1       2       01.01.2019  10.01.2019
1       2       15.01.2019  25.01.2019

Result should be:

SYS_ID  ITEM_ID DATE_FROM   DATE_TO
1       1       01.01.2019  10.02.2019
1       1       15.02.2019  22.03.2019
1       2       01.01.2019  10.01.2019
1       2       15.01.2019  25.01.2019

Is there a way to do this without using cursor?


回答1:


Use gaps and islands approach

Live test: http://sqlfiddle.com/#!18/0174b/3

with gap_detector as
(
     select
        sys_id, item_id,
        date_from, date_to,
        case when 
            lag(date_to) 
            over(partition by sys_id, item_id order by date_from) >= date_from
        then
            0
        else
            1
        end as gap
     from tbl
 )
 , grouper as
 (
     select
         sys_id, item_id,
         date_from, date_to,
         sum(gap) over(partition by sys_id, item_id order by date_from) as grp
     from gap_detector   
)  
select
    sys_id, item_id,
    min(date_from) as date_from,
    max(date_to) as date_to
from grouper
group by sys_id, item_id, grp

Output:

| sys_id | item_id |  date_from |    date_to |
|--------|---------|------------|------------|
|      1 |       1 | 2019-01-01 | 2019-02-10 |
|      1 |       1 | 2019-02-15 | 2019-03-22 |
|      1 |       2 | 2019-01-01 | 2019-01-10 |
|      1 |       2 | 2019-01-15 | 2019-01-25 |

How it works

First we need to detect if date_to from previous row (using lag) overlaps with current date_from.

Note that we have independent sets of date_from, that is, the previous row of sys_id + item_id combo (e.g., 1,1) does not overlap with another sys_id + item_id combo (e.g., 1,2). So the first previous date_to of 1,2 is not March 22, 2019, it's NULL instead. We can properly identify the previous row of each sys_id + item_id combo by partitioning them, i.e., partition by sys_id, item_id.

With that said here is how we can identify if date_to from previous row overlaps with current date_from:

  • If the current date_from overlaps with previous date_to, don't isolate the current date_from from the previous row, we can do this by giving the current row a value of 0.
  • Otherwise, if the current date_from does not overlap with previous date_to, isolate (in other word gap) the current row from previous row, by marking the current row as a gap, we can do this by giving it a value of 1. It will come later why we need 1 and 0.

Live test: http://sqlfiddle.com/#!18/0174b/7

with gap_detector as
(
     select
        sys_id, item_id,
        date_from, date_to,
        case when 
            lag(date_to) 
            over(partition by sys_id, item_id order by date_from) >= date_from
        then
            0
        else
            1
        end as gap
     from tbl
)
select * 
from gap_detector
order by sys_id, item_id, date_from

Output:

| sys_id | item_id |  date_from |    date_to | gap |
|--------|---------|------------|------------|-----|
|      1 |       1 | 2019-01-01 | 2019-01-20 |   1 |
|      1 |       1 | 2019-01-15 | 2019-02-10 |   0 |
|      1 |       1 | 2019-02-15 | 2019-02-20 |   1 |
|      1 |       1 | 2019-02-18 | 2019-03-10 |   0 |
|      1 |       1 | 2019-03-10 | 2019-03-22 |   0 |
|      1 |       2 | 2019-01-01 | 2019-01-10 |   1 |
|      1 |       2 | 2019-01-15 | 2019-01-25 |   1 |        

Next step is to group the the islands that belong to each other by doing a running total over the gap markers (1 and 0). Running total is done by doing a sum(gap) over the window of sys_id + item_id combo.

Each window of sys_id + item_id combo can be operated on independently by doing a partition on them, i.e., partition by sys_id, item_id

Live test: http://sqlfiddle.com/#!18/0174b/12

with gap_detector as
(
     select
        sys_id, item_id,
        date_from, date_to,
        case when 
            lag(date_to) 
            over(partition by sys_id, item_id order by date_from) >= date_from
        then
            0
        else
            1
        end as gap
     from tbl
 )
 , grouper as
 (
     select
         sys_id, item_id,
         date_from, date_to,
         gap,
         sum(gap) over(partition by sys_id, item_id order by date_from) as grp
     from gap_detector   
)  
select sys_id, item_id, date_from, date_to, gap, grp
from grouper

Output:

| sys_id | item_id |  date_from |    date_to | gap | grp |
|--------|---------|------------|------------|-----|-----|
|      1 |       1 | 2019-01-01 | 2019-01-20 |   1 |   1 |
|      1 |       1 | 2019-01-15 | 2019-02-10 |   0 |   1 |
|      1 |       1 | 2019-02-15 | 2019-02-20 |   1 |   2 |
|      1 |       1 | 2019-02-18 | 2019-03-10 |   0 |   2 |
|      1 |       1 | 2019-03-10 | 2019-03-22 |   0 |   2 |
|      1 |       2 | 2019-01-01 | 2019-01-10 |   1 |   1 |
|      1 |       2 | 2019-01-15 | 2019-01-25 |   1 |   2 |

Finally, now that we are able to identify which islands belong to each other (denoted by grp), it's just a matter of doing a group by on those grp markers to identify when date_from and date_to started on each group (grp) of islands.

Live test: http://sqlfiddle.com/#!18/0174b/13

select
    sys_id, item_id,
    min(date_from) as date_from,
    max(date_to) as date_to
from grouper
group by sys_id, item_id, grp

Output:

| sys_id | item_id |  date_from |    date_to |
|--------|---------|------------|------------|
|      1 |       1 | 2019-01-01 | 2019-02-10 |
|      1 |       1 | 2019-02-15 | 2019-03-22 |
|      1 |       2 | 2019-01-01 | 2019-01-10 |
|      1 |       2 | 2019-01-15 | 2019-01-25 |


来源:https://stackoverflow.com/questions/55906748/group-rows-by-id-and-find-max-mindate-from-date-to-with-date-gaps

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!