SQL: How to update duplicates at separate table?

泄露秘密 提交于 2019-12-11 03:47:25

问题


I have 2 tables:

Table1:    
id1 | id2
1   | a
2   | a
3   | a
4   | b
5   | b

Table2:    
data | id1
...  | 1
...  | 2
...  | 2
...  | 3
...  | 4
...  | 5

At Table1 I should have unique association of id1->id2, for some unknown reason it's not. I need to fix it and add unique constraint. I need to leave only one to one relation at Table1 and update duplicated ids at Table2 only with that id which is left at Table1. As the result I should have:

Table1:    
id1 | id2
1   | a
4   | b

Table2:    
data | id1
...  | 1
...  | 1
...  | 1
...  | 1
...  | 4
...  | 4

I know how to find duplicated ids:

SELECT id1 FROM Table1 GROUP BY id2 HAVING COUNT(id2) > 1;

But I'm a bit lost on how to do next update and removal.

Data types of id1 and id2 are UUID.


回答1:


Think of the problem as keeping the first relationship. Then, the delete is not so hard:

delete from table1
     where table1.id1 > (select min(tt1.id1) from table1 tt1 where tt1.id2 = table1.id2);

Now, to fix table2, we want a more complicated query that save the results from this. Fortunately, Postgres allows CTEs to contain data modification steps:

with todelete as (
      select t1.*, min(t1.id1) over (partition by id2) as keepid
      from table1
     ),
     d as (
      delete from table1   
      where table1.id > (select min(tt1.id) from table1 tt1 where tt1.id2 = table1.id2)
    )
update table2
    set id1 = (select keepid from todelete where todelete.id1 = table2.id2);



回答2:


The simplest way would be to write a proc, which would to the following.

1) find distinct id2 from table1.

2) For each distinct id2, you should start a loop which should do following

  • for id2 (lets say 'a') find all the id1 from Table1 and store in a variable (like 1,2,3). Keep the lowest id1 (1 in this case) in another variable. Now generate an update statement to table2, set id2 = lowestid where id2 in (list of ids 1,2,3)

    Once update statement is generated, then execute it and commit.

Once updates are done, then you can do delete by using Gordon's query.

If there are many rows to be udpated, then you can set counter on loop and create update statements followed by ';' and append in a variable/cursor and execute after every 100 - 200 rows based on your data.

I am not postgresql guy so please mind any obvious mistake regarding proc. But logic should work.



来源:https://stackoverflow.com/questions/32606249/sql-how-to-update-duplicates-at-separate-table

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!