I'm trying to make sql who will delete all duplicate titles BUT must delete duplicates with these conditions:
- must delete only duplicates with same object_id
- must keep only the newest record (biggest topic_id) (topic_id is the unique id for every topic AI)
So far I've done that (testing with select...)
SELECT topic_id,object_id,title,url,date
FROM topics GROUP BY title
HAVING ( COUNT(title) > 1)
ORDER BY topic_id DESC
But doesn't meet the conditions.
I'm using mysql.
In MySQL
, you cannot specify the target table to a DML
operation in a subquery (unless you nest it more than one level deep, but in this case you won't get reliable results and cannot use correlated subqueries).
Use a JOIN
:
DELETE td
FROM topics td
JOIN topics ti
ON ti.object_id = td.object_id
AND ti.title = td.title
AND ti.topic_id > td.topic_id;
Create an index on topics (object_id, title, topic_id)
for this to work fast.
This will delete all duplicate object_id records save the one with the highest topic_id.
delete from topics outer where exists
(
select 1 from topics inner where
outer.object_id = inner.object_id AND
inner.topic_id < outer.topic_id
)
First, if you have a date field, you'd do better identifying the newest records by their date.
This will work :
SELECT topic_id, object_id, title, url, date
FROM topics earlier
WHERE EXISTS
(SELECT newest.topic_id
FROM topics newest
WHERE newest.date > earlier.date
AND newest.object_id = earlier.object_id)
You're selecting the lines for which exists another line with the same object_id and a more recent date.
WITH tbl AS (SELECT topic_id, object_id, row_number() over(partition by object_id order by topic_id DESC) as rnum
FROM topics)
DELETE tbl WHERE rnum > 1
For more information please check this article: http://blog.sqlauthority.com/2009/06/23/sql-server-2005-2008-delete-duplicate-rows/
来源:https://stackoverflow.com/questions/4133687/delete-all-duplicate-topics-with-few-conditions