What is most efficient way to write from kafka to hdfs with files partitioning into dates

后端 未结 5 1944
囚心锁ツ
囚心锁ツ 2020-12-28 20:53

I\'m working on project that should write via kafka to hdfs. Suppose there is online server that writes messages into the kafka. Each message includes timestamp in it. I w

5条回答
  •  死守一世寂寞
    2020-12-28 21:39

    If you're looking for a more real-time approach you should check out StreamSets Data Collector. It's also an Apache licensed open source tool for ingest.

    The HDFS destination is configurable to write to time based directories based on the template you specify. And it already includes a way to specify a field in your incoming messages to use to determine the time a message should be written. The config is called "Time Basis" and you can specify something like ${record:value("/ts")}.

    *full disclosure I'm an engineer working on this tool.

提交回复
热议问题