Can we delete duplicate records from a table in teradata without using intermediate table

问题

Can we delete duplicate records from a multiset table in teradata without using intermediate table.

Suppose we have 2 rows with values 1, 2, 3 and 1, 2, 3 in my multiset table then after delete i should have only one row i.e. 1, 2, 3.

回答1:

You can't unless the ROWID usage has been enabled on your system (and probablity is quite low). You can easily test it by trying to explain a SELECT ROWID FROM table;

Otherwise there are two possible ways.

Low number of duplicates:

create a new table as result of SELECT all columns FROM table GROUP BY all columns HAVING COUNT(*) > 1;
DELETE FROM tab WHERE EXISTS (SELECT * FROM newtab WHERE...)
INSERT INTO tab SELECT * FROM newtab

High number of duplicates:

copy to a new table using SELECT DISTINCT * or copy to a SET TABLE to get rid of the duplicates and then re-INSERT back

回答2:

Use the same approach, but create a volatile table in the middle.

CREATE VOLATILE MULTISET TABLE TEMPDB.TEMP_DUP_ID (
Row_ID DECIMAL(31,0) ) PRIMARY INDEX (Row_ID) ON COMMIT PRESERVE ROWS;

INSERT INTO TEMPDB.TEMP_DUP_ID SELECT ROW_ID FROM DB.TABLE T QUALIFY ROW_NUMBER() OVER (PARTITION BY DUP ORDER BY DUP DESC) > 1

Then use the table to delete.

Ideally you will have unique key per row, otherwise, you will need to manipulate the data a bit more to generate one (with row_number() for instance... This is just a recommendation).

回答3:

---Without creating intermediate table

delete FROM ORGINAL_TABLE WHERE (COL1, 2) in (select COL1, count() from ORGINAL_TABLE GROUP BY 1 HAVING COUNT() >1 ) and DUPLICATE_BASED_COL >1; -------Delete one row(keep it)

If you have duplicates and want to delete one row, then we need to use the last line in the sql, if we want to delete both rows than, ignore the condition.

来源：https://stackoverflow.com/questions/19537116/can-we-delete-duplicate-records-from-a-table-in-teradata-without-using-intermedi

标签

teradata