Why does a Flume source need to recognize the format of the message?

若如初见. 提交于 2019-12-11 08:48:12

问题


According to the Flume documentation from here

A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. For example, an Avro Flume source can be used to receive Avro events from Avro clients or other Flume agents in the flow that send events from an Avro sink.

Why does a Flume source need to recognize or understand the format of the message? While all it does it does is to forward the message to one of the channel.


回答1:


Since what I've learnt, Flume encapsulate the transfering data in an event packet made by an header and a payload (the transfering data). From the documentation:

A Flume event is defined as a unit of data flow having a byte payload and an optional set of string attributes.

Immediately before your documentation citation.

The format you specify is the format of the event packet, not the format of your data.

Let's suppose you have this agent:

plain_to_avro_translator.sources = plain-source avro-source
plain_to_avro_translator.sinks = avro-sink local-file-sink
plain_to_avro_translator.channels = mem-channel1 mem-channel2

plain_to_avro_translator.sources.plain-source.channels = mem-channel1
plain_to_avro_translator.sources.plain-source.type = exec
plain_to_avro_translator.sources.plain-source.restart = true
plain_to_avro_translator.sources.plain-source.restartThrottle = 40000
plain_to_avro_translator.sources.plain-source.command = cat /home/user/data.log

plain_to_avro_translator.sinks.avro-sink.channel = mem-channel1
plain_to_avro_translator.sinks.avro-sink.type = thrift
plain_to_avro_translator.sinks.avro-sink.hostname = 192.168.200.43
plain_to_avro_translator.sinks.avro-sink.port = 6000

plain_to_avro_translator.channels.mem-channel1.type = memory
plain_to_avro_translator.channels.mem-channel1.capacity = 100
plain_to_avro_translator.channels.mem-channel1.transactionCapacity = 100

plain_to_avro_translator.sources.avro-source.channels = mem-channel2
plain_to_avro_translator.sources.avro-source.type = thrift
plain_to_avro_translator.sources.avro-source.bind = 0.0.0.0
plain_to_avro_translator.sources.avro-source.port = 6000

plain_to_avro_translator.channels.mem-channel2.type = memory
plain_to_avro_translator.channels.mem-channel2.capacity = 100
plain_to_avro_translator.channels.mem-channel2.transactionCapacity = 100

plain_to_avro_translator.sinks.local-file-sink.channel = mem-channel2
plain_to_avro_translator.sinks.local-file-sink.type = file_roll
plain_to_avro_translator.sinks.local-file-sink.sink.directory = /home/user/flume_output

This will work with no problems and is not dependant from the data.log format (you can write whatever you need and in whatever format). If you try to set the avro-sink type to avro instead of thrift, you will get errors from avro-source because it expects thrift format event.

Sink and source needs to know how to parse event packet.

Hope I got it well. Please anyone correct me if I am wrong.



来源:https://stackoverflow.com/questions/19520691/why-does-a-flume-source-need-to-recognize-the-format-of-the-message

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!