How to replicate a SAS merge

后端 未结 2 396
一向
一向 2020-12-22 14:30

I have two tables, t1 and t2:

t1
  person | visit | code1 | type1
       1       1      50      50 
       1       1      50      50 
       1       2      7         


        
2条回答
  •  -上瘾入骨i
    2020-12-22 15:14

    You can replicate a SAS merge by adding a row_number() to each table:

    select t1.*, t2.*
    from (select t1.*,
                 row_number() over (partition by person, visit order by ??) as seqnum
          from t1
         ) t1 full outer join
         (select t2.*,
                 row_number() over (partition by person, visit order by ??) as seqnum
          from t2
         ) t2
         on t1.person = t2.person and t1.visit = t2.visit and
            t1.seqnum = t2.seqnum;
    

    Notes:

    • The ?? means to put in the column(s) used for ordering. SAS datasets have an intrinsic order. SQL tables do not, so the ordering needs to be specified.
    • You should list the columns explicitly (instead of using t1.*, t2.* in the outer query). I think SAS only includes person and visit once in the resulting dataset.

    EDIT:

    Note: the above produces separate values for the key columns. This is easy enough to fix:

    select coalesce(t1.person, t2.person) as person,
           coalesce(t1.key, t2.key) as key,
           t1.code1, t1.type1, t2.code2, t2.type2
    from (select t1.*,
                 row_number() over (partition by person, visit order by ??) as seqnum
          from t1
         ) t1 full outer join
         (select t2.*,
                 row_number() over (partition by person, visit order by ??) as seqnum
          from t2
         ) t2
         on t1.person = t2.person and t1.visit = t2.visit and
            t1.seqnum = t2.seqnum;
    

    That fixes the columns issue. You can fix the copying issue by using first_value()/last_value() or by using a more complicated join condition:

    select coalesce(t1.person, t2.person) as person,
           coalesce(t1.visit, t2.visit) as visit,
           t1.code1, t1.type1, t2.code2, t2.type2
    from (select t1.*,
                 count(*) over (partition by person, visit) as cnt,
                 row_number() over (partition by person, visit order by ??) as seqnum
          from t1
         ) t1 full outer join
         (select t2.*,
                 count(*) over (partition by person, visit) as cnt,
                 row_number() over (partition by person, visit order by ??) as seqnum
          from t2
         ) t2
         on t1.person = t2.person and t1.visit = t2.visit and
            (t1.seqnum = t2.seqnum or
            (t1.cnt > t2.cnt and t1.seqnum > t2.seqnum and t2.seqnum = t2.cnt) or
            (t2.cnt > t1.cnt and t2.seqnum > t1.seqnum and t1.seqnum = t1.cnt)
    

    This implements the "keep the last row" logic in a single join. Probably for performance reasons, you would want to put this into separate left joins on the original logic.

提交回复
热议问题