Concat Avro files using avro-tools

眉间皱痕 提交于 2019-12-20 20:39:03

问题


Im trying to merge avro files into one big file, the problem is concat command does not accept the wildcard

hadoop jar avro-tools.jar concat /input/part* /output/bigfile.avro

I get:

Exception in thread "main" java.io.FileNotFoundException: File does not exist: /input/part*

I tried to use "" and '' but no chance.


回答1:


I quickly checked Avro's source code (1.7.7) and it seems that concat does not support glob patterns (basically, they call FileSystem.open() on each argument except the last one).

It means that you have to explicitly provide all the filenames as argument. It is cumbersome, but following command should do what you want:

IN=$(hadoop fs -ls /input/part* | awk '{printf "%s ", $NF}')
hadoop jar avro-tools.jar concat ${IN} /output/bigfile.avro

It would be a nice addition to add support of glob pattern to this command.




回答2:


Instead of hadoop jar avro-tools.jar one can run java -jar avro-tools.jar, since you don't need hadoop for this operation.



来源:https://stackoverflow.com/questions/34856838/concat-avro-files-using-avro-tools

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!