Hadoop: compress file in HDFS?

后端 未结 7 1892
逝去的感伤
逝去的感伤 2020-11-27 18:23

I recently set up LZO compression in Hadoop. What is the easiest way to compress a file in HDFS? I want to compress a file and then delete the original. Should I create a

7条回答
  •  醉酒成梦
    2020-11-27 18:47

    @Chitra I cannot comment due to reputation issue

    Here is everything in one command: Instead of using the second command, you can reduce into one compressed file directly

    hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar \
            -Dmapred.reduce.tasks=1 \
            -Dmapred.output.compress=true \
            -Dmapred.compress.map.output=true \
            -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec \
            -input /input/raw_file \
            -output /archives/ \
            -mapper /bin/cat \
            -reducer /bin/cat \
            -inputformat org.apache.hadoop.mapred.TextInputFormat \
            -outputformat org.apache.hadoop.mapred.TextOutputFormat
    

    Thus, you gain a lot of space by having only one compress file

    For example, let's say i have 4 files of 10MB (it's plain text, JSON formatted)

    The map only is giving me 4 files of 650 KB If I map and reduce I have 1 file of 1.05 MB

提交回复
热议问题