Why don't I see any output from the Kafka Streams reduce method?

纵饮孤独 提交于 2019-12-04 08:42:16

As of Kafka 0.10.1.0 all aggregation operators use an internal de-duplication cache to reduce the load of the result KTable changelog stream. For example, if you count and process two records with same key directly after each other, the full changelog stream would be <key:1>, <key:2>.

With the new caching feature, the cache would receive <key:1> and store it, but not send it downstream right away. When <key:2> is computed, it replace the first entry of the cache. Depending on the cache size, number of distinct key, throughput, and your commit interval, the cache sends entries downstream. This happens either on cache eviction for a single key entry or as a complete flush of the cache (sending all entries downstream). Thus, the KTable changelog might only show <key:2> (because <key:1> got de-duplicated).

You can control the size of the cache via Streams configuration parameter StreamConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG. If you set the value to zero, you disable caching completely and the KTable changelog will contain all updates (effectively providing pre 0.10.1.0 behavior).

Confluent documentation contains a section explaining the cache in more detail:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!