Camus Migration - Kafka HDFS Connect does not start from the set offset

允我心安 提交于 2019-12-01 11:07:31

问题


I am currently using Confluent HDFS Sink Connector (v4.0.0) to replace Camus. We are dealing with sensitive data so we need to maintain consistency in offset during cutover to connectors.

Cutover plan:

  1. We created hdfs sink connector and subscribed to a topic which writes to a temporary hdfs file. This creates a consumer group with name connect-
  2. Stopped the connector using DELETE request.
  3. Using /usr/bin/kafka-consumer-groups script, I am able to set the connector consumer group kafka topic partition's current offset to a desired value (i.e. last offset Camus wrote + 1).
  4. When i restart the hdfs sink connector, it continues reading from the last committed connector offset and ignores the set value. I am expecting the hdfs file name to be like: hdfs_kafka_topic_name+kafkapartition+Camus_offset+Camus_offset_plus_flush_size.format

Is my expectation of confluent connector behavior correct ?


回答1:


When you restart this connector, it will use the offset embedded in the file have of the last file written to hdfs. It will not use the consumer group offset. It does this because it uses a write ahead log to achieve exactly once deliver to hdfs.



来源:https://stackoverflow.com/questions/49837808/camus-migration-kafka-hdfs-connect-does-not-start-from-the-set-offset

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!