How do I transform events in Flume and send them to another channel?

情到浓时终转凉″ 提交于 2020-01-05 02:50:09

问题


Flume has some ready components to transform events before pushing them further - like RegexHbaseEventSerializer you can stick into an HBaseSink. Also, it's easy to provide a custom serializer.

I want to process events and send them to the next channel. Most close to what I want is Regex Extractor Interceptor , which accepts a custom serialiser for regexp matches. But it does not substitute event body, just appends new headers with results to events, thus making output flow heavier. I'd like to accept big sized events, like zipped html > 5KB, parse them and put many slim messages, like urls found in pages, to another channel.

                  channel1                channel2
HtmlPagesSource -----------> PageParser -----------> WhateverSinkGoesNext
                    html                    urls

Do I have to write a custom sink for that, or is there some type of component that accepts custom serializers, like HBaseSink?

If I write a sink, do I just use Flume client SDK and call append(Event) or appendBatch(List) when processing incoming events?


回答1:


It seems like you need run two Flume agents:

Agent1: HtmlPagesSource -> channel -> PageParser (extends AvroSink and overrides process method that can parse input and write many slim messages)

Agent2: AvroSource -> channel -> WhateverSinkGoesNext

Look for some examples of chaining Flume data flows: http://www.ibm.com/developerworks/library/bd-flumews/#N10081



来源:https://stackoverflow.com/questions/21481034/how-do-i-transform-events-in-flume-and-send-them-to-another-channel

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!