Hadoop HDFS: Read sequence files that are being written

后端 未结 4 907
暖寄归人
暖寄归人 2021-01-24 10:23

I am using Hadoop 1.0.3.

I write logs to an Hadoop sequence file into HDFS, I call syncFS() after each bunch of logs but I never close the file (except when I am perform

4条回答
  •  忘掉有多难
    2021-01-24 10:55

    The reason the SequenceFile.Reader fails to read a file being written is that it uses the file length to perform its magic.

    The file length stays at 0 while the first block is being written, and is updated only when the block is full (by default 64MB). Then the file size is stuck at 64MB until the second block is fully written and so on...

    That means you can't read the last incomplete block in a sequence file using SequenceFile.Reader, even if the raw data is readable using directly FSInputStream.

    Closing the file also fixes the file length, but in my case I need to read files before they are closed.

提交回复
热议问题