How to replicate a SAS merge

后端 未结 2 398
一向
一向 2020-12-22 14:30

I have two tables, t1 and t2:

t1
  person | visit | code1 | type1
       1       1      50      50 
       1       1      50      50 
       1       2      7         


        
2条回答
  •  再見小時候
    2020-12-22 15:30

    Gordon's answer is close; but it misses one point. Here's its output:

    person  visit   code1   type1   seqnum  person  visit   code2   type2   seqnum
    1       1       1       1       1       1       1       1       1       1
    1       1       2       2       2       1       1       2       2       2
    NULL    NULL    NULL    NULL    NULL    1       1       3       3       3
    1       2       1       3       1       NULL    NULL    NULL    NULL    NULL
    

    The third row's nulls are incorrect, while the fourth's are correct.

    As far as I know, in SQL there's not a really good way to do this other than splitting things up into a few queries. I think there are five possibilities:

    • Matching person/visit, Matching seqnums
    • Matching person/visit, Left has more seqnums
    • Matching person/visit, Right has more seqnums
    • Left has unmatched person/visit
    • Right has unmatched person/visit

    I think the last two might be workable into one query, but I think the second and third have to be separate queries. You can union everything together, of course.

    So here's an example, using some temporary tables that are a little more well suited to see what's going on. Note that the third row is now filled in for code1 and type1, even though those are 'extra'. I've only added three of the five criteria - the three you had in your initial example - but the other two aren't too hard.

    Note that this is an example of something far faster in SAS - because SAS has a row-wise concept, ie, it's capable of going one row at a time. SQL tends to take a lot longer at these, with large tables, unless it's possible to partition things very neatly and have very good indexes - and even then I've never seen a SQL DBA do anywhere near as well as SAS at some of these types of problems. That's something you'll have to accept of course - SQL has its own advantages, one of which being probably price...

    Here's my example code. I'm sure it's not terribly elegant, hopefully one of the SQL folk can improve it. This is written to work in SQL Server (using table variables), same thing should work with some changes (to use temporary tables) in other variants, assuming they implement windowing. (SAS of course can't do this particular thing - as even FedSQL implements ANSI 1999, not ANSI 2008.) This is based on Gordon's initial query, then modified with the additional bits at the end. Anyone who wants to improve this please feel free to edit and/or copy to a new/existing answer any bit you wish.

    declare @t1 table (person INT, visit INT, code1 INT, type1 INT);
    declare @t2 table (person INT, visit INT, code2 INT, type2 INT);
    
    
    insert into @t1 values (1,1,1,1)
    insert into @t1 values (1,1,2,2)
    insert into @t1 values (1,2,1,3)
    
    insert into @t2 values (1,1,1,1)
    insert into @t2 values (1,1,2,2)
    insert into @t2 values (1,1,3,3)
    
    select coalesce(t1.person, t2.person) as person, coalesce(t1.visit, t2.visit) as visit,
                    t1.code1, t1.type1, t2.code2, t2.type2
    from (select *,
                 row_number() over (partition by person, visit order by type1) as seqnum
          from @t1
         ) t1 inner join
         (select *,
                 row_number() over (partition by person, visit order by type2) as seqnum
          from @t2
         ) t2
         on t1.person = t2.person and t1.visit = t2.visit and
            t1.seqnum = t2.seqnum
     union all
    
    select coalesce(t1.person, t2.person) as person, coalesce(t1.visit, t2.visit) as visit,
                    t1.code1, t1.type1, t2.code2, t2.type2
    from (
          (select person, visit, MAX(seqnum) as max_rownum from (
            select person, visit, 
                 row_number() over (partition by person, visit order by type1) as seqnum
          from @t1) t1_f 
          group by person, visit
         ) t1_m inner join
         (select *, row_number() over (partition by person, visit order by type1) as seqnum
           from @t1
          ) t1 
            on t1.person=t1_m.person and t1.visit=t1_m.visit
            and t1.seqnum=t1_m.max_rownum
            inner join
         (select *,
                 row_number() over (partition by person, visit order by type2) as seqnum
          from @t2
         ) t2
         on t1.person = t2.person and t1.visit = t2.visit and
            t1.seqnum < t2.seqnum 
         )
     union all
     select t1.person, t1.visit, t1.code1, t1.type1, t2.code2, t2.type2
         from @t1 t1 left join @t2 t2
        on t2.person=t1.person and t2.visit=t1.visit
        where t2.code2 is null
    

提交回复
热议问题