SQL Server 2008: delete duplicate rows

后端 未结 6 894
北荒
北荒 2020-12-13 11:29

I have duplicate rows in my table, how can I delete them based on a single column\'s value?

Eg

uniqueid, col2, col3 ...
1, john, simpson
2, sally, ro         


        
相关标签:
6条回答
  • 2020-12-13 11:48

    Here is simple magic to remove duplicates

    select * into NewTable from ExistingTable
    union
    select * from ExistingTable;
    
    0 讨论(0)
  • 2020-12-13 11:54

    You can DELETE from a cte:

    WITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY uniqueid ORDER BY col2)'RowRank'
                 FROM Table)
    DELETE FROM cte 
    WHERE RowRank > 1
    

    The ROW_NUMBER() function assigns a number to each row. PARTITION BY is used to start the numbering over for each item in that group, in this case each value of uniqueid will start numbering at 1 and go up from there. ORDER BY determines which order the numbers go in. Since each uniqueid gets numbered starting at 1, any record with a ROW_NUMBER() greater than 1 has a duplicate uniqueid

    To get an understanding of how the ROW_NUMBER() function works, just try it out:

    SELECT *,ROW_NUMBER() OVER(PARTITION BY uniqueid ORDER BY col2)'RowRank'
    FROM Table
    ORDER BY uniqueid
    

    You can adjust the logic of the ROW_NUMBER() function to adjust which record you'll keep or remove.

    For instance, perhaps you'd like to do this in multiple steps, first deleting records with the same last name but different first names, you could add last name to the PARTITION BY:

    WITH cte AS (SELECT *,ROW_NUMBER() OVER(PARTITION BY uniqueid, col3 ORDER BY col2)'RowRank'
                 FROM Table)
    DELETE FROM cte 
    WHERE RowRank > 1
    
    0 讨论(0)
  • 2020-12-13 12:00

    DELETE FROM table WHERE uniqueid='1' AND col2='john' Or you change col2='john' to col2='johnny'. Depends on which record you want to delete.

    How did you end up with two same "unique" IDs in the first place?

    0 讨论(0)
  • 2020-12-13 12:09

    You have many ways for deleting the duplicate records some of them are below...........

    Different ways to delete Duplicate records

    Using Row_Number() function and CTE

      with CTE(DuplicateCount) as  ( SELECT  ROW_NUMBER() OVER
    (PARTITION by UniqueId order by UniqueId ) as DuplicateCount from
    Table1 ) Delete from CTE where DuplicateCount > 1
    
      .Without using CTE*
    
    Delete DuplicateCount from ( Select Row_Number() over(Partition by
    UniqueId order by UniqueId) as Dup from Table1 ) DuplicateCount 
    where DuplicateCount.Dup > 1
    
     .Without using row_Number() and CTE
    
    Delete from Subject where RowId not in(select Min(RowId ) from
    Subject group by UniqueId)
    
    0 讨论(0)
  • 2020-12-13 12:13

    You probably have a row id that is assigned by the DB upon insertion and is actually unique. I'll call this rowId in my example.

    rowId |uniqueid |col2  |col3
    ----- |-------- |----  |----
    1      10        john   simpson
    2      20        sally  roberts
    3      10        johnny simpson
    

    You can remove duplicates by grouping on the thing that is supposed to be unique (whether it be one column or many), then you grab a rowId from each group, and delete everything else besides those rowIds. In the inner query, everything in the table will have a rowId except for the duplicate rows.

    select * 
    --DELETE 
    FROM MyTable 
    WHERE rowId NOT IN 
    (SELECT MIN(rowId) 
     FROM MyTable 
     GROUP BY uniqueid);
    

    You could also use MAX instead of MIN with similar results.

    0 讨论(0)
  • 2020-12-13 12:13
    DECLARE @du TABLE (
        id INT,  
        Name VARCHAR(4)
    )
    
    INSERT INTO @du VALUES(1,'john')
    INSERT INTO @du VALUES(2,'jane')
    INSERT INTO @du VALUES(1,'john')
    
    ;WITH dup (id,dp)
    AS
    (SELECT id
    , ROW_NUMBER() OVER(PARTITION BY id ORDER BY Name) AS dp
    FROM @du)
    DELETE FROM dup
    WHERE dp > 1
    
    SELECT *
    FROM @du
    
    0 讨论(0)
提交回复
热议问题