Append data to existing file in HDFS Java

本小妞迷上赌 提交于 2019-12-28 05:23:23

问题


I'm having trouble to append data to an existing file in HDFS. I want that if the file exists then append a line, if not, create a new file with the name given.

Here's my method to write into HDFS.

if (!file.exists(path)){
   file.createNewFile(path);
}

FSDataOutputStream fileOutputStream = file.append(path); 
BufferedWriter br = new BufferedWriter(new OutputStreamWriter(fileOutputStream));
br.append("Content: " + content + "\n");
br.close();

Actually this method writes into HDFS and create a file but as I mention is not appending.

This is how I test my method:

RunTimeCalculationHdfsWrite.hdfsWriteFile("RunTimeParserLoaderMapperTest2", "Error message test 2.2", context, null);

The first param is the name of the file, the second the message and the other two params are not important.

So anyone have an idea what I'm missing or doing wrong?


回答1:


Actually, you can append to a HDFS file:

From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.

I checked HDFS sources, there is DistributedFileSystem#append method:

 FSDataOutputStream append(Path f, final int bufferSize, final Progressable progress) throws IOException

For details, see presentation.

Also you can append through command line:

hdfs dfs -appendToFile <localsrc> ... <dst>

Add lines directly from stdin:

echo "Line-to-add" | hdfs dfs -appendToFile - <dst>



回答2:


Solved..!!

Append is supported in HDFS.

You just have to do some configurations and simple code as shown below :

Step 1: set dfs.support.append as true in hdfs-site.xml :

<property>
   <name>dfs.support.append</name>
   <value>true</value>
</property>

Stop all your daemon services using stop-all.sh and restart it again using start-all.sh

Step 2 (Optional): Only If you have a singlenode cluster , so you have to set replication factor to 1 as below :

Through command line :

./hdfs dfs -setrep -R 1 filepath/directory

Or you can do the same at run time through java code:

fsShell.setrepr((short) 1, filePath);  

Step 3 : Code for Creating/appending data into the file :

public void createAppendHDFS() throws IOException {
    Configuration hadoopConfig = new Configuration();
    hadoopConfig.set("fs.defaultFS", hdfsuri);
    FileSystem fileSystem = FileSystem.get(hadoopConfig);
    String filePath = "/test/doc.txt";
    Path hdfsPath = new Path(filePath);
    fShell.setrepr((short) 1, filePath); 
    FSDataOutputStream fileOutputStream = null;
    try {
        if (fileSystem.exists(hdfsPath)) {
            fileOutputStream = fileSystem.append(hdfsPath);
            fileOutputStream.writeBytes("appending into file. \n");
        } else {
            fileOutputStream = fileSystem.create(hdfsPath);
            fileOutputStream.writeBytes("creating and writing into file\n");
        }
    } finally {
        if (fileSystem != null) {
            fileSystem.close();
        }
        if (fileOutputStream != null) {
            fileOutputStream.close();
        }
    }
}

Kindly let me know for any other help.

Cheers.!!




回答3:


HDFS does not allow append operations. One way to implement the same functionality as appending is:

  • Check if file exists.
  • If file doesn't exist, then create new file & write to new file
  • If file exists, create a temporary file.
  • Read line from original file & write that same line to temporary file (don't forget the newline)
  • Write the lines you want to append to the temporary file.
  • Finally, delete the original file & move(rename) the temporary file to the original file.


来源:https://stackoverflow.com/questions/22997137/append-data-to-existing-file-in-hdfs-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!