Merging multiple files into one within Hadoop

前端 未结 8 866
遇见更好的自我
遇见更好的自我 2020-12-01 02:18

I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do

8条回答
  •  醉酒成梦
    2020-12-01 02:49

    In order to keep everything on the grid use hadoop streaming with a single reducer and cat as the mapper and reducer (basically a noop) - add compression using MR flags.

    hadoop jar \
        $HADOOP_PREFIX/share/hadoop/tools/lib/hadoop-streaming.jar \
    -Dmapred.reduce.tasks=1 \ -Dmapred.job.queue.name=$QUEUE \ -input "$INPUT" \ -output "$OUTPUT" \ -mapper cat \ -reducer cat

    If you want compression add
    -Dmapred.output.compress=true \ -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec

提交回复
热议问题