How does Kafka Streams work with Partitions that contain incomplete Data?

与世无争的帅哥 提交于 2019-12-04 05:58:38

Kafka Streams will do the repartitioning of your data automatically. Your program will be something like:

stream.map(...).groupByKey().count();

For this pattern, Kafka Streams detects that you set a new key in map and thus will create a topic automatically in the background to repartition the data for the groupByKey().count() step (as of v0.10.1 via KAFKA-3561).

Note, map() "marks" the stream that it requires repartitioning and .groupByKey().count() will create the topic for repartitioning. With this regard, repartitioning is "lazy", i.e., it is only done if required. If there is no .groupByKey().count() there would be no repartitioning introduced.

Basically, the program from above is executed in the same way as

stream.map(...).through("some-topic").groupByKey().count();

Kafka Streams automatically "insert" the through() step and thus computes the correct result.

If you are using Kafka Streams 0.10.0, you will need to create the repartition topic manually with the desired number of partitions and you will need to add the call to through() to your code, too.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!