Events that should be emitted by a KTable

拜拜、爱过 提交于 2021-02-08 03:42:07

问题


I am trying to test a topology that, as the last node, has a KTable. My test is using a full-blown Kafka Cluster (through confluent's Docker images) so I am not using the TopologyTestDriver.

My topology has input of key-value types String -> Customer and output of String -> CustomerMapped. The serdes, schemas and integration with Schema Registry all work as expected.

I am using Scala, Kafka 2.2.0, Confluent Platform 5.2.1 and kafka-streams-scala. My topology, as simplified as possible, looks something like this:

val otherBuilder = new StreamsBuilder()

otherBuilder
     .table[String,Customer](source)
     .mapValues(c => CustomerMapped(c.surname, c.age))
     .toStream.to(target)   

(all implicit serdes, Produced, Consumed, etc are the default ones and are found correctly)

My test consists in sending a few records (data) onto the source topic in synchronously and without pause, and reading back from the target topic, I compare the results with expected:

val data: Seq[(String, Customer)] = Vector(
   "key1" -> Customer(0, "Obsolete", "To be overridden", 0),
   "key1" -> Customer(0, "Obsolete2", "To be overridden2", 0),
   "key1" -> Customer(1, "Billy", "The Man", 32),
   "key2" -> Customer(2, "Tommy", "The Guy", 31),
   "key3" -> Customer(3, "Jenny", "The Lady", 40)
)
val expected = Vector(
   "key1" -> CustomerMapped("The Man", 32),
   "key2" -> CustomerMapped("The Guy", 31),
   "key3" -> CustomerMapped("The Lady", 40)
)

I build the Kafka Stream application, setting between other settings, the following two:

p.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, "5000")
val s: Long = 50L * 1024 * 1024
p.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, s.toString)

So I expect the KTable to use caching, having an interval of 5 seconds between commits and a cache size of 50MB (more than enough for my scenario).

My problem is that the results I read back from the target topic always contain multiple entries for key1. I would have expected no event to be emitted for the records with Obsolete and `Obsolete1. The actual output is:

Vector(
    "key1" -> CustomerMapped("To be overridden", 0),
    "key1" -> CustomerMapped("To be overridden2", 0),
    "key1" -> CustomerMapped("The Man", 32),
    "key2" -> CustomerMapped("The Guy", 31),
    "key3" -> CustomerMapped("The Lady", 40)
)

One final thing to mention: this test used to work as expected until I updated Kafka from 2.1.0 to 2.2.0. I verified this downgrading my application again.

I am quite confused, can anyone point out whether something changed in the behaviour of KTables in the 2.2.x versions? Or maybe there are now new settings I have to set to control the emission of events?


回答1:


In Kafka 2.2, an optimization was introduced to reduce the resource footprint of Kafka Streams. A KTable is not necessarily materialized if it's not required for the computation. This holds for your case, because mapValues() can be computed on-the-fly. Because the KTable is not materialized, there is no cache and thus each input record produces one output record.

Compare: https://issues.apache.org/jira/browse/KAFKA-6036

If you want to enforce KTable materialization, you can pass in Materilized.as("someStoreName") into StreamsBuilder#table() method.



来源:https://stackoverflow.com/questions/55687101/events-that-should-be-emitted-by-a-ktable

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!