I am trying to delete some duplicate data in my redshift table.
Below is my query:-
With duplicates
As
(Select *, ROW_NUMBER() Over (PARTITION by rec
Your query does not work because Redshift does not allow DELETE
after the WITH
clause. Only SELECT
and UPDATE
and a few others are allowed (see WITH clause)
Solution (in my situation):
I did have an id column on my table events
that contained duplicate rows and uniquely identifies the record. This column id
is the same as your record_indicator
.
Unfortunately I was unable to create a temporary table because I ran into the following error using SELECT DISTINCT
:
ERROR: Intermediate result row exceeds database block size
But this worked like a charm:
CREATE TABLE temp as (
SELECT *,ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS rownumber
FROM events
);
resulting in the temp
table:
id | rownumber | ...
----------------
1 | 1 | ...
1 | 2 | ...
2 | 1 | ...
2 | 2 | ...
Now the duplicates can be deleted by removing the rows having rownumber
larger than 1:
DELETE FROM temp WHERE rownumber > 1
After that rename the tables and your done.