问题
I have the following SQL to delete duplicate values form a table,
DELETE p1
FROM `ProgramsList` p1, `ProgramsList` p2
WHERE p1.CustId = p2.CustId
AND p1.CustId = 1
AND p1.`Id`>p2.`Id`
AND p1.`ProgramName` = p2.`ProgramName`;
Id
is auto incremental
for a given CustId
ProgramName
must be unique (currently it is not)
The above SQL takes about 4 to 5 hours to complete with about 1,000,000 records
Could anyone suggest a quicker way of deleting duplicates from a table?
回答1:
First, You might try adding indexes to ProgramName and CustID fields if you don't already have them.
De-Duping
You can group your records to identify dupes, and as you are doing that, grab the min ID value for each group. Then, just delete all records whose ID is not one of the MinID's.
In-Clause Method
delete from
ProgramsList
where
id not in
(select min(id) as MinID
from ProgramsList
group by ProgramName, CustID)
Join-Method
You may have to run this more than once, if there are many members per group.
DELETE P
FROM ProgramsList as P
INNER JOIN
(select count(*) as Count, max(id) as MaxID
from ProgramsList
group by ProgramName, CustID) as A on A.MaxID = P.id
WHERE A.Count >= 2
Some people have performance issues with the In-Clause, some don't. It depends a lot on your indexes and such. If one is too slow, try the other.
Related: https://stackoverflow.com/a/4192849/127880
回答2:
This will remove all the duplicates in one go.
From the inner query an ID is got which is not deleted and the rest is deleted for each of the program.
delete p from ProgramsList as p
INNER JOIN (select ProgramName as Pname, max(id) as MaxID
from ProgramsList
group by ProgramName, CustID order by null) as A on Pname=P.ProgramName
where A.MaxID != P.id
来源:https://stackoverflow.com/questions/14816863/what-is-the-best-way-to-delete-duplicate-values-from-mysql-table