Understanding transaction in Processor implementation in Kafka Streams

半世苍凉 提交于 2019-12-05 07:28:03

问题


While using Processor API of Kafka Streams, I use something like this:

context.forward(key,value)
context.commit()

Actually, what I'm doing here is sending forward a state from state store to sink every minute (using context.schedule() in init() method). What I don't understand here is:

[Key,Value] pair I'm sending forward and then doing commit() is taken from state store. It is aggregated according to my specific logic from many not sequential input [key,value] pairs. Each such output [key,value] pair is aggregation of few not ordered [key,value] pairs from input (kafka topic). So, I don't understand how Kafka cluster and Kafka Streams lib can know the correlation between the original input [key,value] pairs and the eventual output [key,value] that is being sent out. How it can be wrapped by transaction (fail-safe), if Kafka doesn't know the connection between input pairs and output pair. And what is actually being committed when I do context.commit()?

Thanks!


回答1:


To explain all this in details goes beyond what I can write here in an answer.

Basically the current input topic offsets and all writes to Kafka topics are done atomically if a transaction is committed. This implies, that all pending writes are flushed before the commit is done.

Transactions don't need to know about your actual business logic. They just "synchronize" the progress tracking on the input topics with writes to output topics.

I would recommend to read corresponding blog posts and watch talks about exactly-once in Kafka to get more details:

  • Blog: https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/
  • Blog: https://www.confluent.io/blog/enabling-exactly-kafka-streams/
  • Talk: https://www.confluent.io/kafka-summit-nyc17/resource/#exactly-once-semantics_slide
  • Talk: https://www.confluent.io/kafka-summit-sf17/resource/#Exactly-once-Stream-Processing-with-Kafka-Streams_slide

Btw: This is a question about manual commits in Streams API. You should consider this: How to commit manually with Kafka Stream?



来源:https://stackoverflow.com/questions/48258730/understanding-transaction-in-processor-implementation-in-kafka-streams

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!