I am trying to delete some duplicate data in my redshift table.
Below is my query:-
With duplicates
As
(Select *, ROW_NUMBER() Over (PARTITION by rec
Your query does not work because Redshift does not allow DELETE after the WITH clause. Only SELECT and UPDATE and a few others are allowed (see WITH clause)
Solution (in my situation):
I did have an id column on my table events that contained duplicate rows and uniquely identifies the record. This column id is the same as your record_indicator.
Unfortunately I was unable to create a temporary table because I ran into the following error using SELECT DISTINCT:
ERROR: Intermediate result row exceeds database block size
But this worked like a charm:
CREATE TABLE temp as (
SELECT *,ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS rownumber
FROM events
);
resulting in the temp table:
id | rownumber | ...
----------------
1 | 1 | ...
1 | 2 | ...
2 | 1 | ...
2 | 2 | ...
Now the duplicates can be deleted by removing the rows having rownumber larger than 1:
DELETE FROM temp WHERE rownumber > 1
After that rename the tables and your done.