Best practice for reading data from Kafka to AWS Redshift

问题

What is the best practice for moving data from a Kafka cluster to a Redshift table? We have continuous data arriving on Kafka and I want to write it to tables in Redshift (it doesn't have to be in real time).

Should I use Lambda function?
Should I write a Redshift connector (consumer) that will run on a dedicated EC2 instance? (downside is that I need to handle redundancy)
Is there some AWS pipeline service for that?

回答1:

Kafka Connect is commonly used for streaming data from Kafka to (and from) data stores. It does useful things like automagically managing scaleout, fail over, schemas, serialisation, and so on.

This blog shows how to use the open-source JDBC Kafka Connect connector to stream to Redshift. There is also a community Redshift connector, but I've not tried this.

This blog shows another approach, not using Kafka Connect.

Disclaimer: I work for Confluent, who created the JDBC connector.

来源：https://stackoverflow.com/questions/51595109/best-practice-for-reading-data-from-kafka-to-aws-redshift

标签

amazon-web-services

apache-kafka

aws-lambda

amazon-redshift

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!