MySQL Delete duplicates in consecutive rows

问题

Suppose this table:

ID ColA ColB
1   7    8
2   7    9
3   7    9
4   5    8
5   6    9
6   6    9
7   5    4

The PK is the ID coumn. Now, I want to delete all duplicates of ColA and ColB in consecutive rows.

In this example rows 2,3 and 5,6 contain duplicates. These shall be removed so that the higher ID is remained.

The output should be:

ID ColA ColB
1   7    8

3   7    9
4   5    8

6   6    9
7   5    4

How can this be done with mySQL?

Thanks, Juergen

回答1:

SELECT 
    ID
FROM
    MyTable m1
WHERE
    0 < (SELECT 
            COUNT(*)
        FROM
            MyTable m2
        WHERE
            m2.ID = m1.ID - 1 AND 
            m2.ColA = m1.ColA AND 
            m2.ColB = m1.ColB)

and then you can use a

delete from MyTable where ID in ...

query. This way it would surely work in any version.

回答2:

CREATE TEMPORARY TABLE duplicates (id int primary key)

INSERT INTO duplicates (id)
    SELECT t1.id
      FROM table t1
      join table t2 on t2.id = t1.id + 1
     WHERE t1.ColA = t2.ColA
       and t1.ColB = t2.ColB

-- SELECT * FROM duplicates --> are you happy with that? => delete
DELETE table
  FROM table
  join duplicates on table.id = duplicates.id

回答3:

Depending on how many records you have, this might not be the most efficient:

SELECT (SELECT TOP 1 id FROM table WHERE colA = m.colA AND colB = m.colB ORDER BY id DESC) AS id, m.*
FROM (SELECT DISTINCT colA, colB
      FROM  table) m

There might be syntax errors because I usually use mssql, but the idea should be similar.

回答4:

I've called the first table 'test'.

Firstly create a table that will hold all the identical combinations of ColA and ColB:

create temporary table tmpTable (ColA int, ColB int);
insert into tmpTable select ColA,ColB from test group by ColA, ColB;

Now, select the maximum id in the original table for each identical combination of ColA and ColB. Put this into a new table (called idsToKeep because these are the rows we do not want to delete):

create temporary table idsToKeep (ID int);
insert into idsToKeep select (select max(ID) from test where test.ColA=tmpTable.ColA and test.ColB=tmpTable.ColB) from tmpTable;

Finally, delete all the entries from the original table that are not in the idsToKeep table:

delete from test where ID <> all (select ID from idsToKeep);

来源：https://stackoverflow.com/questions/8243472/mysql-delete-duplicates-in-consecutive-rows

标签

mysql

duplicates

delete-row