Why is hsync() not flushing my hdfs file?

帅比萌擦擦* 提交于 2020-01-02 08:35:11

问题


Despite all the resources about this subject, I have issues flushing my hdfs files on disk (hadoop 2.6) Calling FSDataOutputStream.hsync() should do the trick, but it actually only works once for unknown reasons...

Here is a simple unit test that fails:

@Test
public void test() throws InterruptedException, IOException {
    final FileSystem filesys = HdfsTools.getFileSystem();
    final Path file = new Path("myHdfsFile"); 
    try (final FSDataOutputStream stream = filesys.create(file)) {
        Assert.assertEquals(0, getSize(filesys, file));  
        stream.writeBytes("0123456789");
        stream.hsync();
        stream.hflush();
        stream.flush();
        Thread.sleep(100);
        Assert.assertEquals(10, getSize(filesys, file)); // Works
        stream.writeBytes("0123456789");
        stream.hsync();
        stream.hflush();
        stream.flush();
        Thread.sleep(100);
        Assert.assertEquals(20, getSize(filesys, file)); // Fails, still 10           
    }
    Assert.assertEquals(20, getSize(filesys, file)); // works
}


private long getSize(FileSystem filesys, Path file) throws IOException {
    return filesys.getFileStatus(file).getLen();
}

Any idea why?


回答1:


In fact, hsync() internally calls the private flushOrSync(boolean isSync, EnumSet<SyncFlag> syncFlags) with no flag, and the length is only updated on the namenode if SyncFlag.UPDATE_LENGTH is provided.

In the above test, replacing getSize() by a code that actually reads the file works.

private long getSize(FileSystem filesys, Path file) throws IOException {        
    long length = 0;
    try (final  FSDataInputStream input = filesys.open(file)) {
        while (input.read() >= 0) {
            length++;
        }
    }
    return length;
}

To update the size, you can alternatively call (without the proper class type checking):

((DFSOutputStream) stream.getWrappedStream())).hsync(EnumSet.of(SyncFlag.UPDATE_LENGTH));


来源:https://stackoverflow.com/questions/32231105/why-is-hsync-not-flushing-my-hdfs-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!