How to discover and filter out duplicate records in Kafka Streams

十年热恋 提交于 2019-12-06 10:35:47

You can use more than one attribute for grouping. Create a custom key by concatenation and pass as key:

KTable<String,String> modifiedTable =  nameStream.groupBy((key,value) -> value.getName()+value.getId()).reduce((aggVal,newval) -> aggVal);

Above KTable will give the updated status for any record with the given name and ID. So for {id:1,name:Chris.....}, it will have only one record in KTable:

While in below case, both records will be present:

<Chris1,  {id:1, name:Chris, age:99}> 
<Chris2,   {id:2, name:Chris, age:xx}> 

Now you want to use the name attribute for count operation. So Change the key to name and re-group the table and perform count().

KTable countTable = modifiedTable.groupBy((k,v)-> KeyValue.pair(v.getName(), v)).count();

Here count() will be performed on top of KTable. KTable is the updated view for any given ID.
Hence for below input, modifiedTable will have 1 record at a time as updated value for key "Chris1" and you will get count=>1

<Chris,1> // Here key will be Chris1

Below input will result **count=>2

{id:1, name:Chris, age:99}  // Here key was be Chris1
{id:2, name:Chris, age:xx}  // Here key was be Chris2
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!