How to implement Change Data Capture (CDC) using apache spark and kafka?

问题

I am using spark-sql-2.4.1v with java 1.8. and kafka versions spark-sql-kafka-0-10_2.11_2.4.3 and kafka-clients_0.10.0.0.

I need to join streaming data with meta-data which is stored in RDS. but RDS meta data could be added/changed.

If I read and load RDS table data in application , it would be stale for joining with streaming data.

I understood ,need to use Change Data Capture (CDC). How can I implement Change Data Capture (CDC) in my scenario?

any clues or sample way to implement Change Data Capture (CDC) ?

thanks a lot.

回答1:

You can stream a database into Kafka so that the contents of a table plus every subsequent change is available on a Kafka topic. From here it can be used in stream processing.

You can do CDC in two different ways:

Query-based: poll the database for changes, using Kafka Connect JDBC Source
Log-based: extract changes from the database's transaction log using e.g. Debezium

For more details and examples see http://rmoff.dev/ksny19-no-more-silos

来源：https://stackoverflow.com/questions/59071338/how-to-implement-change-data-capture-cdc-using-apache-spark-and-kafka

标签

apache-spark

apache-kafka

cdc

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!