Capture changes between 2 datasets with duplicates

蹲街弑〆低调 提交于 2020-05-17 06:27:47

问题


This is a follow-up question of Capture changes in 2 datasets. I need to capture change between 2 datasets based on key(s): one historical and another current version of the same dataset (both datasets share same schema). These datasets can have duplicate rows as well. In below example id is considered key for comparison:

-- Table t_curr
-------
id  col
-------
1   A
1   B
2   C
3   F

-- Table t_hist
-------
id  col
-------
1   B
2   C
2   D
4   G
-- Expected output t_change
----------------
id  col change
----------------
1   A   modified   -- change status is 'modified' as first row for id=1 is different for both tables
1   B   inserted
2   C   same
2   D   deleted
3   F   inserted
4   G   deleted

I'm looking for an efficient solution to get the desired output.

EDIT

Explanation: While fetching data from t_curr if records come in the same order as shown and records were ranked wrt to id:

  1. 1/A is first and 1/B second records in t_curr
  2. 1/B is the first records in t_hist
  3. 1st record for both datasets compared ie 1/A in t_curr compared with 1/B of t_hist hence 1/A marked as modified in t_change
  4. Since 1/B present only in t_curr it's marked inserted

回答1:


I was able to do it using full outer join and row_number(). Query:

with t_hist as (
select 1 as id, 'B' as col union all
select 2 as id, 'C' as col union all
select 2 as id, 'D' as col union all
select 4 as id, 'G' as col
),
t_curr as (
select 1 as id1,    'A' as col1 union all
select 1 as id1,    'B' as col1 union all
select 2 as id1,    'C' as col1 union all
select 3 as id1,    'F' as col1
)

select
  case when id1 is null then id else id1 end as id_,
  case when col1 is null then col else col1 end as col_,
  case 
    when id is null then 'inserted'
    when id1 is null then 'deleted'
    when col = col1 then 'same'
    else 'modified'
    end
  as change
from
(select t_curr.*, t_hist.* from (select *, row_number() over (partition by id1 order by id1) r1 from t_curr) t_curr 
full outer join (select *, row_number() over (partition by id) r from t_hist ) t_hist on id1 = id and r1 = r )
order by id_


来源:https://stackoverflow.com/questions/61570412/capture-changes-between-2-datasets-with-duplicates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!