Deleting records from one table joined onto another table SQL

谁说我不能喝 提交于 2021-01-28 11:33:10

问题


I have two tables one with 212,000 records (deprecated records) and the other with 10,500,000 records

I would like to join the two tables on id and version_number fields as both tables have these fields. I was hoping that from the joined table that the matched records (from the joined tables) could be deleted i.e all of the 212,000 records get deleted from the 10,500,000

I was wondering what the best approach would be for this using Oracle SQL? I have seen example where inner join has been used using a single field and a delete statement has been used to delete table1 from table 2 but not seen one with two fields used (in the join).

Would it make sense to use an outer join before deleting the records? I was thinking this may help me track what has been deleted if possible


回答1:


You do not need to use OUTER JOIN except for the check how many rows will resp. will not be deleted.

An example of such query see below (I use generated test data provided at the end of the answer)

with del as (
select delta.id, delta.version,
decode(big.id,null,0,1) is_deleted
from delta
left outer join big 
on delta.id = big.id and delta.version = big.version
)
select is_deleted, count(*) cnt, max(id||'.'||version) eg_id_vers
from del
group by is_deleted;

IS_DELETED        CNT  EG_ID_VERS                                                                   
---------- ---------- ----------
         1      20000 99995.0   
         0         20 100100.0   

With your data size you should use a HASH JOIN with full table scan on both tables to get acceptable performance.

There are basically two options how to do the DELETE

Updatable Join View

Note that in this case your small table must have an unique index on ID, VERSION (or a primary key)

create unique index delta_idx on delta(id,version);

Contrary the BIG table should not have such constraint. This is important, because it clearly indicates that you BIG table is the only one key preserving table in the join view.

Simple put a join to the small table can't duplicate rows from the big table due to the unique contraint

See here more information about Updating a Join Views

delete from 
(
select delta.id, delta.version, big.id big_id, big.version
from big 
join delta 
on delta.id = big.id and delta.version = big.version
)

The delete above removes rows from the BIG table because this is the only key preserving table (see the discussion above)

This DML leads to a HASH JOIN

Delete with EXISTS

If your small table has no primary key (i.e. it can contain duplicated rows with same ID and VERSION) you must fallback to the solution proposed in other answer.

DELETE FROM big 
    WHERE EXISTS (SELECT null
                  FROM delta
                  WHERE delta.id = big.id and delta.version = big.version
                 ) 

No indexes are required and you should expect an execution plan with HASH JOIN RIGHT SEMI, which means that both approaches are not realy different.

Sample Data for Test

create table big as
select 
trunc(rownum/10) id, mod(rownum,10) version,
lpad('x',10,'Y') pad
from dual connect by level <= 1000000;

/* the DELTA table has 50 times less rows,
allow some rows out of range of the BIG table - those rows will not be deleted **/
drop table delta;
create table delta as
select 
trunc(rownum*50/10) id, mod(rownum*50,10) version
from dual connect by level <= 1001000/50;

create unique index delta_idx on delta(id,version);



回答2:


A simple approach just uses IN or EXISTS:

DELETE FROM bigtable bt
    WHERE EXISTS (SELECT 1
                  FROM littletable lt
                  WHERE bt.? = lt.?
                 );

You want an index on littletable for the keys used for the correlation clause.



来源:https://stackoverflow.com/questions/59911501/deleting-records-from-one-table-joined-onto-another-table-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!