Kafka Streams - reducing the memory footprint for large state stores

╄→尐↘猪︶ㄣ 提交于 2019-12-04 19:15:18

With your current pattern

stream.....reduce(.)toStream().to(OUTPUT_TOPIC);
builder.table(OUTPUT_TOPIC, REKEYED_STORE)

you get two stores with the same content. One for the reduce() operator and one for reading the table() -- this can be reduced to one store though:

KTable rekeyedTable  = stream.....reduce(.);
rekeyedTable.toStream().to(OUTPUT_TOPIC); // in case you need this output topic; otherwise you can also omit it completely

This should reduce you memory usage notably.

About windowing vs non-windowing:

  1. it's a matter of your required semantics; so simple switching from a non-windowed to a windowed reduce seems to be questionable.

  2. Even if you can also go with windowed semantics, you would not necessarily reduce memory. Note, in aggregation case, Streams does not store the raw records but only the current aggregate result (ie, key + currentAgg). Thus, for a single key, the storage requirement is the same for both cases (a single window has the same storage requirement). At the same time, if you go with windows, you might actually need more memory as you get an aggregate pro key pro window (while you get just a single aggregate pro key in the non-window case). The only scenario you might save memory, is the case for which you 'key space' is spread out over a long period of time. For example, you might not get any input records for some keys for a long time. In the non-windowed case, the aggregate(s) of those records will be stores all the time, while for the windowed case the key/agg record will be dropped and new entried will be re-created if records with this key occure later on again (but keep in mind, that you lost the previous aggergate in this case -- cf. (1))

Last but not least, you might want to have a look into the guidelines for sizing an application: http://docs.confluent.io/current/streams/sizing.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!