Left joining a KStream on another Kstream, but only with “latest” results

耗尽温柔 提交于 2019-12-11 07:23:18

问题


I have a data stream on Kafka that I stream as a Kstream. Next to it I have a meta data stream that I would like to enrich the data stream with. A fairly common scenario present in several examples.

What I haven't solved is when the meta data stream contains more than one result for the specified window. What is commonly wanted in this scenario is to join it with the latest, or last, element from the meta data stream. A sales order would for example be materialised once, with the latest customer object, not twice for each sequential customer update.

Imagine the following scenario:

When element 7 (green) arrives it gets joined with 2 and 3 from the meta data stream, even though only 3 is relevant (in my case).

I realise this could be a good match for a Kstream<-Ktable join, where the Ktable only would contain the latest record in the meta data stream. But that has the huge disadvantage in that it will not cope with late and out-of-order data in a good fashion.

The question boils down to: How do I join a Kstream with another Kstream, but only with the latest event in the latter?

来源:https://stackoverflow.com/questions/47495299/left-joining-a-kstream-on-another-kstream-but-only-with-latest-results

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!