Copy data from Amazon S3 to Redshift and avoid duplicate rows

前端 未结 4 1101
深忆病人
深忆病人 2021-02-01 10:03

I am copying data from Amazon S3 to Redshift. During this process, I need to avoid the same files being loaded again. I don\'t have any unique constraints on my Redshift table.

4条回答
  •  情书的邮戳
    2021-02-01 10:22

    Mmm..

    What about just never loading data into your master table directly.

    Steps to avoid duplication:

    1. begin transaction
    2. bulk load into a temp staging table
    3. delete from master table where rows = staging table rows
    4. insert into master table from staging table (merge)
    5. drop staging table
    6. end transaction.

    this is also super somewhat fast, and recommended by redshift docs.

提交回复
热议问题