Hadoop: compress file in HDFS?

后端未结

关注

 7  1892

逝去的感伤 2020-11-27 18:23

I recently set up LZO compression in Hadoop. What is the easiest way to compress a file in HDFS? I want to compress a file and then delete the original. Should I create a

7条回答

醉酒成梦 (楼主)

2020-11-27 18:47

@Chitra I cannot comment due to reputation issue

Here is everything in one command: Instead of using the second command, you can reduce into one compressed file directly

hadoop jar share/hadoop/tools/lib/hadoop-streaming-2.7.3.jar \
        -Dmapred.reduce.tasks=1 \
        -Dmapred.output.compress=true \
        -Dmapred.compress.map.output=true \
        -Dmapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec \
        -input /input/raw_file \
        -output /archives/ \
        -mapper /bin/cat \
        -reducer /bin/cat \
        -inputformat org.apache.hadoop.mapred.TextInputFormat \
        -outputformat org.apache.hadoop.mapred.TextOutputFormat

Thus, you gain a lot of space by having only one compress file

For example, let's say i have 4 files of 10MB (it's plain text, JSON formatted)

The map only is giving me 4 files of 650 KB If I map and reduce I have 1 file of 1.05 MB

0 讨论(0)

查看其它7个回答