问题
According to the Flume documentation from here
A Flume source consumes events delivered to it by an external source like a web server. The external source sends events to Flume in a format that is recognized by the target Flume source. For example, an Avro Flume source can be used to receive Avro events from Avro clients or other Flume agents in the flow that send events from an Avro sink.
Why does a Flume source need to recognize or understand the format of the message? While all it does it does is to forward the message to one of the channel.
回答1:
Since what I've learnt, Flume encapsulate the transfering data in an event packet made by an header and a payload (the transfering data). From the documentation:
A Flume event is defined as a unit of data flow having a byte payload and an optional set of string attributes.
Immediately before your documentation citation.
The format you specify is the format of the event packet, not the format of your data.
Let's suppose you have this agent:
plain_to_avro_translator.sources = plain-source avro-source
plain_to_avro_translator.sinks = avro-sink local-file-sink
plain_to_avro_translator.channels = mem-channel1 mem-channel2
plain_to_avro_translator.sources.plain-source.channels = mem-channel1
plain_to_avro_translator.sources.plain-source.type = exec
plain_to_avro_translator.sources.plain-source.restart = true
plain_to_avro_translator.sources.plain-source.restartThrottle = 40000
plain_to_avro_translator.sources.plain-source.command = cat /home/user/data.log
plain_to_avro_translator.sinks.avro-sink.channel = mem-channel1
plain_to_avro_translator.sinks.avro-sink.type = thrift
plain_to_avro_translator.sinks.avro-sink.hostname = 192.168.200.43
plain_to_avro_translator.sinks.avro-sink.port = 6000
plain_to_avro_translator.channels.mem-channel1.type = memory
plain_to_avro_translator.channels.mem-channel1.capacity = 100
plain_to_avro_translator.channels.mem-channel1.transactionCapacity = 100
plain_to_avro_translator.sources.avro-source.channels = mem-channel2
plain_to_avro_translator.sources.avro-source.type = thrift
plain_to_avro_translator.sources.avro-source.bind = 0.0.0.0
plain_to_avro_translator.sources.avro-source.port = 6000
plain_to_avro_translator.channels.mem-channel2.type = memory
plain_to_avro_translator.channels.mem-channel2.capacity = 100
plain_to_avro_translator.channels.mem-channel2.transactionCapacity = 100
plain_to_avro_translator.sinks.local-file-sink.channel = mem-channel2
plain_to_avro_translator.sinks.local-file-sink.type = file_roll
plain_to_avro_translator.sinks.local-file-sink.sink.directory = /home/user/flume_output
This will work with no problems and is not dependant from the data.log format (you can write whatever you need and in whatever format). If you try to set the avro-sink type to avro instead of thrift, you will get errors from avro-source because it expects thrift format event.
Sink and source needs to know how to parse event packet.
Hope I got it well. Please anyone correct me if I am wrong.
来源:https://stackoverflow.com/questions/19520691/why-does-a-flume-source-need-to-recognize-the-format-of-the-message