How to handle multiline log entries in Flume

不羁的心 提交于 2019-12-08 03:32:28

As documentation states, spooldir source creates a new event for each string of characters separated by a newline in input data. You can modify this behaviour by creating your own sink (see http://flume.apache.org/FlumeDeveloperGuide.html#sink) based on code of spooldir source. You'll need to implement parsing algorithm that will be able do detect the start and the end line of message based on some criteria.

Also, there are other sources, such as Syslog UDP and Avro, that treat an entire received message as a single event, so you can use it without any modifcation.

You'll want to look into extending the line deserializer used by spool source, one simple (but potentially flawed) approach would be delimit on newlines, but combine lines that are prefixed with a set number of spaces to the previous line.

In fact there is already a Jira issue for this with a patch:

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!