Flume NG and HDFS

流过昼夜 提交于 2019-12-01 05:29:54

Flume writes to HDFS by means of HDFS sink. When Flume starts and begins to receive events, the sink opens new file and writes events into it. At some point previously opened file should be closed, and until then data in the current block being written is not visible to other redaers.

As described in the documentation, Flume HDFS sink has several file closing strategies:

  • each N seconds (specified by rollInterval option)
  • after writing N bytes (rollSize option)
  • after writing N received events (rollCount option)
  • after N seconds of inactivity (idleTimeout option)

So, to your questions:

a) Flume writes events to currently opened file until it is closed (and new file opened).

b) Append is allowed in HDFS, but Flume does not use it. After file is closed, Flume does not append to it any data.

c) To hide currently opened file from mapreduce application use inUsePrefix option - all files with name that starts with . is not visible to MR jobs.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!