问题
I have just started playing with Flume. I have a question on how to handle log entries that are multiline, as a single event. Like stack traces during error conditions. For example, treat the below as a single event rather than one event for each line
2013-04-05 05:00:41,280 ERROR (ClientRequestPool-PooledExecutionEngine-Id#4 ) [com.ms.fw.rexs.gwy.api.service.AbstractAutosysJob] job failed for 228794 java.lang.NullPointerException at com.ms.fw.rexs.core.impl.service.job.ReviewNotificationJobService.createReviewNotificationMessageParameters(ReviewNotificationJobService.java:138) ....
I have configured the source to a spooldir type.
Thank You Suman
回答1:
As documentation states, spooldir source creates a new event for each string of characters separated by a newline in input data. You can modify this behaviour by creating your own sink (see http://flume.apache.org/FlumeDeveloperGuide.html#sink) based on code of spooldir source. You'll need to implement parsing algorithm that will be able do detect the start and the end line of message based on some criteria.
Also, there are other sources, such as Syslog UDP and Avro, that treat an entire received message as a single event, so you can use it without any modifcation.
回答2:
You'll want to look into extending the line deserializer used by spool source, one simple (but potentially flawed) approach would be delimit on newlines, but combine lines that are prefixed with a set number of spaces to the previous line.
In fact there is already a Jira issue for this with a patch:
- https://issues.apache.org/jira/browse/FLUME-2779
来源:https://stackoverflow.com/questions/16037023/how-to-handle-multiline-log-entries-in-flume