Deleting duplicates rows from redshift

后端 未结 7 1992
南方客
南方客 2020-12-31 02:45

I am trying to delete some duplicate data in my redshift table.

Below is my query:-

With duplicates
As
(Select *, ROW_NUMBER() Over (PARTITION by rec         


        
7条回答
  •  旧时难觅i
    2020-12-31 03:04

    Your query does not work because Redshift does not allow DELETE after the WITH clause. Only SELECT and UPDATE and a few others are allowed (see WITH clause)

    Solution (in my situation):

    I did have an id column on my table events that contained duplicate rows and uniquely identifies the record. This column id is the same as your record_indicator.

    Unfortunately I was unable to create a temporary table because I ran into the following error using SELECT DISTINCT:

    ERROR: Intermediate result row exceeds database block size

    But this worked like a charm:

    CREATE TABLE temp as (
        SELECT *,ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS rownumber 
        FROM events
    );
    

    resulting in the temp table:

    id | rownumber | ...
    ----------------
    1  | 1         | ...
    1  | 2         | ...
    2  | 1         | ...
    2  | 2         | ...
    

    Now the duplicates can be deleted by removing the rows having rownumber larger than 1:

    DELETE FROM temp WHERE rownumber > 1
    

    After that rename the tables and your done.

提交回复
热议问题