问题
This is a follow-up question of Capture changes in 2 datasets.
I need to capture change between 2 datasets based on key(s): one historical and another current version of the same dataset (both datasets share same schema). These datasets can have duplicate rows as well. In below example id
is considered key for comparison:
-- Table t_curr
-------
id col
-------
1 A
1 B
2 C
3 F
-- Table t_hist
-------
id col
-------
1 B
2 C
2 D
4 G
-- Expected output t_change
----------------
id col change
----------------
1 A modified -- change status is 'modified' as first row for id=1 is different for both tables
1 B inserted
2 C same
2 D deleted
3 F inserted
4 G deleted
I'm looking for an efficient solution to get the desired output.
EDIT
Explanation: While fetching data from t_curr
if records come in the same order as shown and records were ranked wrt to id
:
1/A
is first and1/B
second records int_curr
1/B
is the first records int_hist
- 1st record for both datasets compared ie
1/A
int_curr
compared with1/B
oft_hist
hence1/A
marked asmodified
int_change
- Since
1/B
present only int_curr
it's markedinserted
回答1:
I was able to do it using full outer join
and row_number()
. Query:
with t_hist as (
select 1 as id, 'B' as col union all
select 2 as id, 'C' as col union all
select 2 as id, 'D' as col union all
select 4 as id, 'G' as col
),
t_curr as (
select 1 as id1, 'A' as col1 union all
select 1 as id1, 'B' as col1 union all
select 2 as id1, 'C' as col1 union all
select 3 as id1, 'F' as col1
)
select
case when id1 is null then id else id1 end as id_,
case when col1 is null then col else col1 end as col_,
case
when id is null then 'inserted'
when id1 is null then 'deleted'
when col = col1 then 'same'
else 'modified'
end
as change
from
(select t_curr.*, t_hist.* from (select *, row_number() over (partition by id1 order by id1) r1 from t_curr) t_curr
full outer join (select *, row_number() over (partition by id) r from t_hist ) t_hist on id1 = id and r1 = r )
order by id_
来源:https://stackoverflow.com/questions/61570412/capture-changes-between-2-datasets-with-duplicates