Kafka KStream application - temp file cleanup

萝らか妹 提交于 2019-12-11 13:54:26

问题


Seems that my KStream based application has been piling up many gBs of files (.sst, Log.old.<stamp>, etc).

Will these get cleaned up on their own or is this something I need to keep an eye on? Some param to be set to cull them?


回答1:


About these local/temp files: Some of these files are application state, and those should account for the majority of space consumed. Your application may be "piling up" many GBs of files simply because your application is actually managing a lot of state. These files can be reconstructed (automatically) by replaying the state's changelog from Kafka if you delete them, but this may take some time.

Will these get cleaned up on their own or is this something I need to keep an eye on? Some param to be set to cull them?

Some cleaning up is done, but as I wrote above most probably the files consume that space for a reason. Perhaps you can share a snippet of the app's processing topology as well as some info about the data the app processing, which might help to understand whether the consumed space seems about right or whether there might be an issue.

Clean up: The latest version of Kafka (0.10.0.1) now ships with an application reset tool for Kafka Streams plus some accompanying API methods that help cleaning/resetting, see Data Reprocessing with Kafka Streams: Resetting a Streams Application. That said, I am not sure whether you are intending to clean up files because you have stopped the application and want to get rid of all the local data, or because you want to do some "garbage collection" while the app is still running. If it's about the latter (GC), then in general there's no need to -- the files are there for a good reason, and most probably will just be recreated.



来源:https://stackoverflow.com/questions/39275886/kafka-kstream-application-temp-file-cleanup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!