Copy data from Amazon S3 to Redshift and avoid duplicate rows

前端未结

关注

 4  1101

深忆病人 2021-02-01 10:03

I am copying data from Amazon S3 to Redshift. During this process, I need to avoid the same files being loaded again. I don\'t have any unique constraints on my Redshift table.

4条回答

情书的邮戳 (楼主)

2021-02-01 10:22
Mmm..

What about just never loading data into your master table directly.

Steps to avoid duplication:
1. begin transaction
2. bulk load into a temp staging table
3. delete from master table where rows = staging table rows
4. insert into master table from staging table (merge)
5. drop staging table
6. end transaction.
this is also ~~super~~ somewhat fast, and recommended by redshift docs.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...