I have an http server writing log files which I then load into HDFS using Flume First I want to filter data according to data I have in my header or body. I read that I can
You don't need to write Java code to filter events. Use Regex Filtering Interceptor to filter events which body text matches some regular expression:
agent.sources.logs_source.interceptors = regex_filter_interceptor
agent.sources.logs_source.interceptors.regex_filter_interceptor.type = regex_filter
agent.sources.logs_source.interceptors.regex_filter_interceptor.regex =
agent.sources.logs_source.interceptors.regex_filter_interceptor.excludeEvents = true
To route events based on headers use Multiplexing Channel Selector:
a1.sources = r1
a1.channels = c1 c2 c3 c4
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = state
a1.sources.r1.selector.mapping.CZ = c1
a1.sources.r1.selector.mapping.US = c2 c3
a1.sources.r1.selector.default = c4
Here events with header "state"="CZ" go to channel "c1", with "state"="US" - to "c2" and "c3", all other - to "c4".
This way you can also filter events by header - just route specific header value to channel, which points to Null Sink.