How to delete duplicate rows in SQL Server 2008?

前端 未结 4 1304
栀梦
栀梦 2021-01-06 18:26

How can I delete duplicate rows in SQL Server 2008?

4条回答
  •  耶瑟儿~
    2021-01-06 19:19

    The simplest way is with a CTE (common table expression). I use this method when I've got raw data to import; the first thing I do to sanitize it is to assure there are no duplicates---that I've got some sort of unique handle to each row.

    Summary:

    WITH numbered AS (
        SELECT ROW_NUMBER() OVER(PARTITION BY [dupe-column-list] ORDER BY [dupe-column-list]) AS _dupe_num FROM [table-name] WHERE 1=1
    )
    DELETE FROM numbered WHERE _dupe_num > 1;
    

    The "dupe-column-list" part is where you list all of the columns involved where you wish values were unique. The ORDER BY is where you decide, within a set of duplicates, which row "wins" and which gets deleted. (The "WHERE 1=1" is just a personal habit.)

    The reason it works is because Sql Server keeps an internal, unique reference to each source row that's selected in the CTE. So when the DELETE is executed, it knows the exact row to be deleted, regardless what you put in your CTE's select-list. (If you're nervous, you could change the "DELETE" to "SELECT *", but since you've got duplicate rows, it's not going to help; if you could uniquely identify each row, you wouldn't be reading this.)

    Example:

    CREATE TABLE ##_dupes (col1 int, col2 int, col3 varchar(50));
    INSERT INTO ##_dupes 
        VALUES (1, 1, 'one,one')
            , (2, 2, 'two,two')
            , (3, 3, 'three,three')
            , (1, 1, 'one,one')
            , (1, 2, 'one,two')
            , (3, 3, 'three,three')
            , (1, 1, 'one,one')
            , (1, 2, '1,2');
    

    Of the 8 rows, you have 5 involved with duplicate problems; 3 rows need to get removed. You can see the problems with this:

    SELECT col1
        , col2
        , col3
        , COUNT(1) AS _total 
        FROM ##_dupes 
        WHERE 1=1 
        GROUP BY col1, col2, col3
        HAVING COUNT(1) > 1
        ORDER BY _total DESC;
    

    Now run the following query to remove the duplicates, leaving 1 row from each set of duplicates.

    WITH numbered AS (
        SELECT ROW_NUMBER() OVER(PARTITION BY col1, col2, col3 ORDER BY col1, col2, col3) AS _dupe_num FROM ##_dupes WHERE 1=1
    )
    DELETE FROM numbered WHERE _dupe_num > 1;
    

    You are now left with 5 rows, none of which are duplicated.

提交回复
热议问题