I have two tables, t1 and t2:
t1
person | visit | code1 | type1
1 1 50 50
1 1 50 50
1 2 7
You can replicate a SAS merge by adding a row_number()
to each table:
select t1.*, t2.*
from (select t1.*,
row_number() over (partition by person, visit order by ??) as seqnum
from t1
) t1 full outer join
(select t2.*,
row_number() over (partition by person, visit order by ??) as seqnum
from t2
) t2
on t1.person = t2.person and t1.visit = t2.visit and
t1.seqnum = t2.seqnum;
Notes:
??
means to put in the column(s) used for ordering. SAS datasets have an intrinsic order. SQL tables do not, so the ordering needs to be specified.t1.*, t2.*
in the outer query). I think SAS only includes person
and visit
once in the resulting dataset.EDIT:
Note: the above produces separate values for the key columns. This is easy enough to fix:
select coalesce(t1.person, t2.person) as person,
coalesce(t1.key, t2.key) as key,
t1.code1, t1.type1, t2.code2, t2.type2
from (select t1.*,
row_number() over (partition by person, visit order by ??) as seqnum
from t1
) t1 full outer join
(select t2.*,
row_number() over (partition by person, visit order by ??) as seqnum
from t2
) t2
on t1.person = t2.person and t1.visit = t2.visit and
t1.seqnum = t2.seqnum;
That fixes the columns issue. You can fix the copying issue by using first_value()
/last_value()
or by using a more complicated join
condition:
select coalesce(t1.person, t2.person) as person,
coalesce(t1.visit, t2.visit) as visit,
t1.code1, t1.type1, t2.code2, t2.type2
from (select t1.*,
count(*) over (partition by person, visit) as cnt,
row_number() over (partition by person, visit order by ??) as seqnum
from t1
) t1 full outer join
(select t2.*,
count(*) over (partition by person, visit) as cnt,
row_number() over (partition by person, visit order by ??) as seqnum
from t2
) t2
on t1.person = t2.person and t1.visit = t2.visit and
(t1.seqnum = t2.seqnum or
(t1.cnt > t2.cnt and t1.seqnum > t2.seqnum and t2.seqnum = t2.cnt) or
(t2.cnt > t1.cnt and t2.seqnum > t1.seqnum and t1.seqnum = t1.cnt)
This implements the "keep the last row" logic in a single join. Probably for performance reasons, you would want to put this into separate left join
s on the original logic.