Merge data depending on object and no gaps in dates

我的未来我决定 提交于 2021-02-20 04:25:05

问题


I have got in the DB data like you can see below (additional info about dates: date in valid_from is included, date in valid_to is excluded)

obj_number obj_related valid_from valid_to
AA BB 01.01.2018 01.01.2019
AA BB 01.01.2019 31.03.2019
AA BB 31.03.2019
AA CC 01.01.2020 30.06.2020
AA CC 02.07.2020 31.10.2020
AA CC 31.10.2020 31.12.2020
AA DD 01.01.2018 30.11.2020
AA DD 30.11.2020 31.12.2020

I have to merge the data, but in a special way. It should be merged around obj_related to show the minimum valid_from and maximum valid_from/null. But if there is a GAP in dates like you can see for CC (row 4 and 5) then both records should be in the result. The best way to understand it when I show you the correct result:

obj_number obj_related valid_from valid_to
AA BB 01.01.2018
AA CC 01.01.2020 30.06.2020
AA CC 02.07.2020 31.12.2020
AA DD 01.01.2018 31.12.2020

Oracle version: 12.1.0.2

Could you help me to prepare an SQL query


回答1:


Here is a different solution, which should work in Oracle 10.1 and higher - using the Tabibitosan method. The problem is slightly complicated by the use of NULL to mark an indefinite valid_to date; in particular, the definition of valid_to in the outer query can't simply be max(valid_to) within each group, since that would produce the wrong answer when valid_to may be null.

Other than that, the computation that produces the grp column in the subquery is the main idea: it produces a different date for each "island" in the "gaps and islands" structure of the input data. This is a less known use of the Tabibitosan method; it makes this kind of query as efficient as possible since it requires only one level of analytic functions.

/*
with
  sample_data (...) as (...)
*/
select obj_number, obj_related, min(valid_from) as valid_from,
       max(valid_to) keep (dense_rank last order by valid_from) as valid_to
from   (
         select sd.*,
                nvl(valid_to, date '9999-12-31') - 
                  sum(nvl(valid_to, date '9999-12-31') - valid_from)
                      over (partition by obj_number, obj_related 
                            order     by valid_from) as grp
         from   sample_data sd
       )
group  by obj_number, obj_related, grp
order  by obj_number, obj_related, valid_from
;

The best way to try to understand how the Tabibitosan method works (in this case) is to run the subquery separately and to see what it produces.




回答2:


In Oracle 12.1 or higher, you can solve this easily with match_recognize:

alter session set nls_date_format='dd.mm.yyyy';

with
  sample_data (obj_number, obj_related, valid_from, valid_to) as (
    select 'AA', 'BB', to_date('01.01.2018'), to_date('01.01.2019') from dual union all
    select 'AA', 'BB', to_date('01.01.2019'), to_date('31.03.2019') from dual union all
    select 'AA', 'BB', to_date('31.03.2019'), null                  from dual union all
    select 'AA', 'CC', to_date('01.01.2020'), to_date('30.06.2020') from dual union all
    select 'AA', 'CC', to_date('02.07.2020'), to_date('31.10.2020') from dual union all
    select 'AA', 'CC', to_date('31.10.2020'), to_date('31.12.2020') from dual union all
    select 'AA', 'DD', to_date('01.01.2018'), to_date('30.11.2020') from dual union all
    select 'AA', 'DD', to_date('30.11.2020'), to_date('31.12.2020') from dual
  )
select *
from   sample_data
match_recognize(
  partition by obj_number, obj_related
  order     by valid_from
  measures  first(valid_from) as valid_from, last(valid_to) as valid_to
  pattern   ( a b* )
  define    b as valid_from = prev (valid_to)
);

OBJ_NUMBER OBJ_RELATED  VALID_FROM VALID_TO  
---------- ------------ ---------- ----------
AA         BB           01.01.2018           
AA         CC           01.01.2020 30.06.2020
AA         CC           02.07.2020 31.12.2020
AA         DD           01.01.2018 31.12.2020

Obviously, the with clause is not part of the solution (remove it and use your actual table and column names); I included it for testing.




回答3:


This is a type of gaps-and-islands problem.

For your sample data, you can use lag() to see if the previous row overlaps. If not, then the row is the start of an island. A cumulative sum of the island starts defines all rows in the island -- which can be used for aggregation:

select obj_number, obj_related, min(valid_from), max(valid_to)
from (select t.*,
             sum(case when prev_valid_to >= valid_from then 0 else 1 end) over (partition by obj_number, obj_related order by valid_from) as grp
      from (select t.*,
                   lag(valid_to) over (partition by obj_number, obj_related order by valid_from) as prev_valid_to
            from t
           ) t
     ) t
group by obj_number, obj_related;


来源:https://stackoverflow.com/questions/65940272/merge-data-depending-on-object-and-no-gaps-in-dates

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!