HDFS sink: “clever” folder routing

时间秒杀一切 提交于 2019-12-12 10:21:37

问题


I am new to Flume (and to HDFS), so I hope my question is not stupid.

I have a multi-tenant application (about 100 different customers as for now). I have 16 different data types.

(In production, we have approx. 15 million messages/day through our RabbitMQ)

I want to write to HDFS all my events, separated by tenant, data type, and date, like this :

/data/{tenant}/{data_type}/2014/10/15/file-08.csv

Is it possible with one sink definition ? I don't want to duplicate configuration, and new client arrive every week or so

In documentation, I see

agent1.sinks.hdfs-sink1.hdfs.path = hdfs://server/events/%Y/%m/%d/%H/

Is this possible ?

agent1.sinks.hdfs-sink1.hdfs.path = hdfs://server/events/%tenant/%type/%Y/%m/%d/%H/

I want to write to different folders according to my incoming data.


回答1:


Yes this is indeed possible. You can either use the metadata or some field in the incoming data to redirect the output to.

For example, in my case I am getting different types of log data and I want to store it in respective folders accordingly. Also in my case the first word in my log lines is the file name. Here is the config snippet for the same.

Interceptor:

dataplatform.sources.source1.interceptors = i3
dataplatform.sources.source1.interceptors.i3.type = regex_extractor
dataplatform.sources.source1.interceptors.i3.regex = ^(\\w*)\t.*
dataplatform.sources.source1.interceptors.i3.serializers = s1
dataplatform.sources.source1.interceptors.i3.serializers.s1.name = filename

HDFS Sink

dataplatform.sinks.sink1.type = hdfs
dataplatform.sinks.sink1.hdfs.path = hdfs://server/events/provider=%{filename}/years=%Y/months=%Y%m/days=%Y%m%d/hours=%H

Hope this helps.




回答2:


Possible solution may be to write an interceptor which passes the tenant value.

please refer to the link below

http://hadoopi.wordpress.com/2014/06/11/flume-getting-started-with-interceptors/



来源:https://stackoverflow.com/questions/26385035/hdfs-sink-clever-folder-routing

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!