问题
I have some files in HDFS
in parquet
format. I would like to merge these files into one single large file.
How can I do that?
I have done some thing like below but for text files.
hadoop fs -cat /input_hdfs_dir/* | hadoop fs -put - /output_hdfs_file
But unable to achieve the desired result in parquet
format.
How can I achieve my requirement?
回答1:
Its not possible to merge parquet
files with hdfs commands.
There is a parquet-tools library that can help you achieve the merging
of parquet
files. The command should be
java jar ./parquet-tools-<VERSION>.jar <command> <input-directory> <output-file>
回答2:
The same tool can be used to merge multiple files inside the Hadoop just use $hadoop jar instead of $java -jar before ./parquet-tools
来源:https://stackoverflow.com/questions/44400331/merge-two-parquet-files-in-hdfs