How to Find Rows which are Duplicates by a Key but Not Duplicates in All Columns?

后端 未结 5 2068
逝去的感伤
逝去的感伤 2020-12-30 05:31

I am working with a table which is an extract of a set of other tables. All of the rows of the extract table should be unique according to keys D1, D2 and D3. They are not.

5条回答
  •  天命终不由人
    2020-12-30 06:13

    I haven't had a chance to try Conrad's answer yet, but came up with one of my own. It's rather a "duh" moment.

    So, if you want to find all the rows in set A except for those that are in set B, you use the EXCEPT operator:

    ; 
    WITH KEYDUPLICATES(D1,D2,D3) AS 
    ( 
        SELECT D1, D2, D3 
        FROM SOURCE 
        GROUP BY D1, D2, D3 
        HAVING COUNT(*)>1 
    ),
    KEYDUPLICATEROWS AS
    ( 
        SELECT S.D1, S.D2, S.D3, S.C4, S.C5, S.C6 
        FROM SOURCE S 
        INNER JOIN KEYDUPLICATES D 
            ON S.D1 = D.D1 AND S.D2 = D.D2 AND S.D3 = D.D3 
    ),
    FULLDUPLICATES AS
    (
        SELECT S.D1, S.D2, S.D3, S.C4, S.C5, S.C6 
        FROM SOURCE S
        GROUP BY S.D1, S.D2, S.D3, S.C4, S.C5, S.C6 
        HAVING COUNT(*)>1
    )
    SELECT KR.D1, KR.D2, KR.D3, KR.C4, KR.C5, KR.C6
    FROM KEYDUPLICATEROWS AS KR
    EXCEPT
    SELECT FD.D1, FD.D2, FD.D3, FD.C4, FD.C5, FD.C6
    FROM FULLDUPLICATES AS FD
    ORDER BY D1, D2, D3, C4, C5, C6
    

    This seems to be showing me 1500 rows which are duplicates across (D1,D2,D3), but which are only duplicates across a subset of (D1,D2,D3,C4,C5,C6). In fact, it appears they are duplicates across (D1,D2,D3,C4,C5).

    How to confirm that will be the subject of another question.

提交回复
热议问题