End-of-window outer join with KafkaStreams

泪湿孤枕 提交于 2019-11-30 22:11:45

The DSL might not give you what you want. However, you can use Processor API. Having say this, the leftJoin can actually be used to do the "heavy lifting". Thus, after the leftJoin you can use .transform(...) with an attached state to "clean up" the data further.

For each old&null record you receive, put it into the store. If you receive a later old&new you can remove it from the store. Furthermore, you register a punctuation and on each punctuation call, you scan the store for entries that are "old enough" so you are sure no later old&new join result will be produced. For those entries, you emit old&null and remove from them from the store.

As an alternative, you can also omit the join, and do everything in a single transform() with state. For this, you would need to KStream#merge() old and new stream and call transform() on the merged stream.

Note: instead of registering a punctuation, you can also put the "scan logic" into the transform and execute it each time you process a record.

If I understand your question correctly you only want to report id's as suspicious when there is an "old" without a corresponding "new" within the 2-minute window.

If that's the case you'll want to use a left join :

val leftJoined = oldStream.leftJoin(newStream,...).filter(condition where value expected from "new" stream is null);

HTH

Looks like what you were looking for. Kafka Streams left outer join on timeout

Eliminates the lack of sql-like left join semantic in kafka streams framework. This implementation will generate left join event only if full join event didn't happen in join window duration interval.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!