问题
I have some data stored in S3 . I need to clone/copy this data periodically from S3 to Redshift cluster. To do bulk copy , I can use copy command to copy from S3 to redshift.
Similarly is there any trivial way to copy data from S3 to Redshift periodically .
Thanks
回答1:
Try using AWS Data Pipeline which has various templates for moving data from one AWS service to other. The "Load data from S3 into Redshift" template copies data from an Amazon S3 folder into a Redshift table. You can load the data into an existing table or provide a SQL query to create the table. The Redshift table must have the same schema as the data in Amazon S3.
Data Pipeline supports pipelines to be running on a schedule. You have a cron style editor for scheduling
回答2:
I believe Kinesis Firehose is the simplest way to get this done. Simply create a Kinesis Forehose stream, point it a a specific table in your Redshift cluster, write data to the stream, done :)
Full setup procedure here: https://docs.aws.amazon.com/ses/latest/DeveloperGuide/event-publishing-redshift-firehose-stream.html
回答3:
AWS Lambda Redshift Loader
is a good solution that runs a COPY command on Redshift whenever a new file appears pre-configured location on Amazon S3.
Links:
https://aws.amazon.com/blogs/big-data/a-zero-administration-amazon-redshift-database-loader/ https://github.com/awslabs/aws-lambda-redshift-loader
来源:https://stackoverflow.com/questions/38654865/stream-data-from-s3-bucket-to-redshift-periodically