Merge two parquet files in HDFS

爷,独闯天下 提交于 2019-12-25 08:37:18

问题


I have some files in HDFS in parquet format. I would like to merge these files into one single large file.

How can I do that?

I have done some thing like below but for text files.

hadoop fs -cat /input_hdfs_dir/* | hadoop fs -put - /output_hdfs_file

But unable to achieve the desired result in parquet format.

How can I achieve my requirement?


回答1:


Its not possible to merge parquet files with hdfs commands.

There is a parquet-tools library that can help you achieve the merging of parquet files. The command should be

java jar ./parquet-tools-<VERSION>.jar <command> <input-directory> <output-file>



回答2:


The same tool can be used to merge multiple files inside the Hadoop just use $hadoop jar instead of $java -jar before ./parquet-tools



来源:https://stackoverflow.com/questions/44400331/merge-two-parquet-files-in-hdfs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!