Append data to existing file in HDFS Java

十年热恋 提交于 2019-11-27 19:07:24

Actually, you can append to a HDFS file:

From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.

I checked HDFS sources, there is DistributedFileSystem#append method:

 FSDataOutputStream append(Path f, final int bufferSize, final Progressable progress) throws IOException

For details, see presentation.

Also you can append through command line:

hdfs dfs -appendToFile <localsrc> ... <dst>

Add lines directly from stdin:

echo "Line-to-add" | hdfs dfs -appendToFile - <dst>

HDFS does not allow append operations. One way to implement the same functionality as appending is:

  • Check if file exists.
  • If file doesn't exist, then create new file & write to new file
  • If file exists, create a temporary file.
  • Read line from original file & write that same line to temporary file (don't forget the newline)
  • Write the lines you want to append to the temporary file.
  • Finally, delete the original file & move(rename) the temporary file to the original file.

Solved..!!

Append is supported in HDFS.

You just have to do some configurations and simple code as shown below :

Step 1: set dfs.support.append as true in hdfs-site.xml :

<property>
   <name>dfs.support.append</name>
   <value>true</value>
</property>

Stop all your daemon services using stop-all.sh and restart it again using start-all.sh

Step 2 (Optional): Only If you have a singlenode cluster , so you have to set replication factor to 1 as below :

Through command line :

./hdfs dfs -setrep -R 1 filepath/directory

Or you can do the same at run time through java code:

fShell.setrepr((short) 1, filePath);  

Step 3 : Code for Creating/appending data into the file :

public void createAppendHDFS() throws IOException {
    Configuration hadoopConfig = new Configuration();
    hadoopConfig.set("fs.defaultFS", hdfsuri);
    FileSystem fileSystem = FileSystem.get(hadoopConfig);
    String filePath = "/test/doc.txt";
    Path hdfsPath = new Path(filePath);
    fShell.setrepr((short) 1, filePath); 
    FSDataOutputStream fileOutputStream = null;
    try {
        if (fileSystem.exists(hdfsPath)) {
            fileOutputStream = fileSystem.append(hdfsPath);
            fileOutputStream.writeBytes("appending into file. \n");
        } else {
            fileOutputStream = fileSystem.create(hdfsPath);
            fileOutputStream.writeBytes("creating and writing into file\n");
        }
    } finally {
        if (fileSystem != null) {
            fileSystem.close();
        }
        if (fileOutputStream != null) {
            fileOutputStream.close();
        }
    }
}

Kindly let me know for any other help.

Cheers.!!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!