Consume latest value from a topic for each keys

别来无恙 提交于 2019-12-10 07:55:48

问题


I have a Kafka producer which is producing messages at high rate (message key is let us say a username and value is his current score in a game). The Kafka consumer is relatively slow in processing the consumed messages. Here my requirement is to show most up-to-date score and avoid showing stale data, with the tradeoff that some scores may never be shown.

Essentially for each of the username, I may have hundreds of messages in the same partition, but I always want to read the latest one.

A crude solution which has been implemented was like this: The producer sends just a key as each message and actual value is written to a database, which is shared with the consumer. The consumer reads each key from the queue and value from the database. Here the goal to read always the latest value is achieved by producer overwriting the value in the database -- so consumer which is in fact reading a given key will actually consume the latest value. But this solution has some drawbacks due to high number of reads and updates (slow, race conditions etc.)

I am looking for a more natural way of solving this in kafka or kafka streams where I can somehow define get latest value for a key from the stream of data for each key. Thanks!


回答1:


Below code helped

KStreamBuilder builder = new KStreamBuilder();
KTable<String, String> dataTable = builder.table("input-topic");
dataTable.toStream().foreach((key, message) -> client.post(message));
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();

What makes this possible in practice is in-memory compaction of incoming stream (details explained here). We could control the pressure using the parameters cache.max.bytes.buffering and commit.interval.ms



来源:https://stackoverflow.com/questions/50065094/consume-latest-value-from-a-topic-for-each-keys

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!