How to handle multiline log entries in Flume

笑着哭i 提交于 2020-01-03 02:49:06

问题


I have just started playing with Flume. I have a question on how to handle log entries that are multiline, as a single event. Like stack traces during error conditions. For example, treat the below as a single event rather than one event for each line

2013-04-05 05:00:41,280 ERROR (ClientRequestPool-PooledExecutionEngine-Id#4 ) [com.ms.fw.rexs.gwy.api.service.AbstractAutosysJob] job failed for 228794 java.lang.NullPointerException at com.ms.fw.rexs.core.impl.service.job.ReviewNotificationJobService.createReviewNotificationMessageParameters(ReviewNotificationJobService.java:138) ....

I have configured the source to a spooldir type.

Thank You Suman


回答1:


As documentation states, spooldir source creates a new event for each string of characters separated by a newline in input data. You can modify this behaviour by creating your own sink (see http://flume.apache.org/FlumeDeveloperGuide.html#sink) based on code of spooldir source. You'll need to implement parsing algorithm that will be able do detect the start and the end line of message based on some criteria.

Also, there are other sources, such as Syslog UDP and Avro, that treat an entire received message as a single event, so you can use it without any modifcation.




回答2:


You'll want to look into extending the line deserializer used by spool source, one simple (but potentially flawed) approach would be delimit on newlines, but combine lines that are prefixed with a set number of spaces to the previous line.

In fact there is already a Jira issue for this with a patch:

  • https://issues.apache.org/jira/browse/FLUME-2779


来源:https://stackoverflow.com/questions/16037023/how-to-handle-multiline-log-entries-in-flume

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!