KafkaIO checkpoint - how to commit offsets to Kafka

戏子无情 提交于 2019-12-04 07:24:17

There two options : enable commitOffsetsInFinalize() in KafkaIO or alternately enable auto-commit in Kafka consumer configuration. Note that while commitOffsetsInFinalize() is more in sync with what has been processed in Beam than Kafka's auto-commit, it does not provide strong guarantees exactly-once processing. Imagine a two stage pipeline, Dataflow finalizes Kafka reader after the first stage, without waiting for second stage to complete. If you restart the pipeline from scratch at that time, you would not process the records that completed first stage, but haven't been processed by the second. The issue is no different for PubsubIO.

Regd option (2) : You can configure KafkaIO to start reading from specific timestamp (assuming Kafka server supports it (version 10+)). But does not look any better than enabling auto_commit.

That said, KafkaIO should support finalize. Might be simpler to use than enabling auto_commit (need to think about frequency etc). We haven't had many users asking for it. Please mention it on user@beam.apache.org if you can.

[Update: I am adding support for committing offsets to KafkaCheckpointMark in PR 4481]

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!