Stream data from S3 bucket to redshift periodically

问题

I have some data stored in S3 . I need to clone/copy this data periodically from S3 to Redshift cluster. To do bulk copy , I can use copy command to copy from S3 to redshift.

Similarly is there any trivial way to copy data from S3 to Redshift periodically .

Thanks

回答1:

Try using AWS Data Pipeline which has various templates for moving data from one AWS service to other. The "Load data from S3 into Redshift" template copies data from an Amazon S3 folder into a Redshift table. You can load the data into an existing table or provide a SQL query to create the table. The Redshift table must have the same schema as the data in Amazon S3.

Data Pipeline supports pipelines to be running on a schedule. You have a cron style editor for scheduling

回答2:

I believe Kinesis Firehose is the simplest way to get this done. Simply create a Kinesis Forehose stream, point it a a specific table in your Redshift cluster, write data to the stream, done :)

Full setup procedure here: https://docs.aws.amazon.com/ses/latest/DeveloperGuide/event-publishing-redshift-firehose-stream.html

回答3:

AWS Lambda Redshift Loader is a good solution that runs a COPY command on Redshift whenever a new file appears pre-configured location on Amazon S3.

Links:

https://aws.amazon.com/blogs/big-data/a-zero-administration-amazon-redshift-database-loader/ https://github.com/awslabs/aws-lambda-redshift-loader

来源：https://stackoverflow.com/questions/38654865/stream-data-from-s3-bucket-to-redshift-periodically

标签

amazon-s3

amazon-redshift

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!