How get the T-SQL code to find duplicates?

前端 未结 5 612
清酒与你
清酒与你 2020-12-12 20:36

MS Access has a button to generate sql code for finding duplicated rows. I don\'t know if SQL Server 2005/2008 Managment Studio has this.

  1. If it has, please

相关标签:
5条回答
  • 2020-12-12 20:56

    Well, if you have entire rows as duplicates in your table, you've at least not got a primary key set up for that table, otherwise at least the primary key value would be different.

    However, here's how to build a SQL to get duplicates over a set of columns:

    SELECT col1, col2, col3, col4
    FROM table
    GROUP BY col1, col2, col3, col4
    HAVING COUNT(*) > 1
    

    This will find rows which, for columns col1-col4, has the same combination of values, more than once.

    For instance, in the following table, rows 2+3 would be duplicates:

    PK    col1    col2    col3    col4    col5
    1       1       2       3       4      6
    2       1       3       4       7      7
    3       1       3       4       7      10
    4       2       3       1       4      5
    

    The two rows share common values in columns col1-col4, and thus, by that SQL, is considered duplicates. Expand the list of columns to contain all the columns you wish to analyze this for.

    0 讨论(0)
  • 2020-12-12 21:01

    Another way one can do this is by joining a table on itself.

    SELECT *
    FROM dbo.TableA aBase
    JOIN dbo.TableA aDupes ON aDupes.ColA = aBase.ColA AND
                              aDupes.ColB = aBase.ColB
    WHERE aBase.Pkey < aDupes.Pkey
    

    Note: The aBase.Pkey < aDupes.Pkey is there because joining a table against itself will create two rows per match since the condition will always be true twice.

    In other words: If table aBase has a row equal to a row from aDupes (based on ColA and ColB), the reflection of that match will also be true - that aDupes has a row equal to a row aBase based on ColA and ColB. Therefore both of those matches will be returned in the result set.

    Narrow this down/eliminate this reflection by arbitrarily picking all results where one of the tables has a lower key.

    < or > doesn't matter, as long as the keys are different.

    This also takes care of filtering out matches with a row upon itself because aBase.Pkey < aDupes.Pkey forces the primary keys to be different.

    0 讨论(0)
  • 2020-12-12 21:03

    I found this solution when I need to dump entire rows with one or more duplicate fields but I don't want to type every field name in the table:

    SELECT * FROM db WHERE col IN
        (SELECT col FROM db GROUP BY col HAVING COUNT(*) > 1)
        ORDER BY col
    
    0 讨论(0)
  • 2020-12-12 21:06

    If you're using SQL Server 2005+, you can use the following code to see all the rows along with other columns:

    SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
    FROM table
    

    Youd can also delete (or otherwise work with) duplicates using this technique:

    WITH cte AS
    (SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
        FROM table
    )
    DELETE FROM cte WHERE DuplicateRowNumber > 1
    

    ROW_NUMBER is extremely powerful - there is much you can do with it - see the BOL article on it at http://msdn.microsoft.com/en-us/library/ms186734.aspx

    0 讨论(0)
  • 2020-12-12 21:18

    AFAIK, it doesn't. Just make a select statement grouping by all the fields of a table, and filtering using a having clause where the count is greater than 1.

    If your rows are duplicated except by the key, then don't include the key in the select fields.

    0 讨论(0)
提交回复
热议问题