Delete duplicate records based on multiple columns

蓝咒 提交于 2019-12-04 16:05:38

问题


In our system we run hourly imports from an external database. Due to an error in the import scripts, there are now some duplicate records.

A duplicate is deemed where any record has the same :legacy_id and :company.

What code can I run to find and delete these duplicates?

I was playing around with this:

Product.select(:legacy_id,:company).group(:legacy_id,:company).having("count(*) > 1")

It seemed to return some of the duplicates, but I wasn't sure how to delete from there?

Any ideas?


回答1:


You can try the following approach:

Product.where.not(
  id: Product.group(:legacy_id, :company).pluck('min(products.id)')
).delete_all

Or pure sql:

delete from products
where id not in ( 
   select min(p.id) from products p group by p.legacy_id, p.company
)


来源:https://stackoverflow.com/questions/27189786/delete-duplicate-records-based-on-multiple-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!