How can I concatenate two files in hadoop into one using Hadoop FS shell?

前端未结

关注

 2  1835

盖世英雄少女心 2021-01-12 16:14

I am working with Hadoop 0.20.2 and would like to concatenate two files into one using the -cat shell command if possible (source: http://hadoop.apache.org/common/docs/r0.19

2条回答

孤独总比滥情好 (楼主)

2021-01-12 16:40
The error relates to you trying to re-direct the standard output of the command back to HDFS. There are ways you can do this, using the hadoop fs -put command with the source argument being a hypen:
```
bin/hadoop fs -cat /user/username/folder/csv1.csv /user/username/folder/csv2.csv | hadoop fs -put - /user/username/folder/output.csv
```
-getmerge also outputs to the local file system, not HDFS

Unforntunatley there is no efficient way to merge multiple files into one (unless you want to look into Hadoop 'appending', but in your version of hadoop, that is disabled by default and potentially buggy), without having to copy the files to one machine and then back into HDFS, whether you do that in
- a custom map reduce job with a single reducer and a custom mapper reducer that retains the file ordering (remember each line will be sorted by the keys, so you key will need to be some combination of the input file name and line number, and the value will be the line itself)
- via the FsShell commands, depending on your network topology - i.e. does your client console have a good speed connection to the datanodes? This certainly is the least effort on your part, and will probably complete quicker than a MR job to do the same (as everything has to go to one machine anyway, so why not your local console?)
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...