问题
In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records.
A duplicate is deemed where any record has the same :legacy_id
and :company
.
What code can I run to find and delete these duplicates?
I was playing around with this:
Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1")
It seemed to return some of the duplicates, but I wasn't sure how to delete from there?
Any ideas?
回答1:
You can try the following approach:
Product.where.not(
id: Product.group(:legacy_id, :company).pluck('min(products.id)')
).delete_all
Or pure sql:
delete from products
where id not in (
select min(p.id) from products p group by p.legacy_id, p.company
)
来源:https://stackoverflow.com/questions/27189786/delete-duplicate-records-based-on-multiple-columns