Hadoop: compress file in HDFS?

后端未结

关注

 7  1859

逝去的感伤 2020-11-27 18:23

I recently set up LZO compression in Hadoop. What is the easiest way to compress a file in HDFS? I want to compress a file and then delete the original. Should I create a

7条回答

迷失自我 (楼主)

2020-11-27 18:52

Well, if you compress a single file, you may save some space, but you can't really use Hadoop's power to process that file since the decompression has to be done by a single Map task sequentially. If you have lots of files, there's Hadoop Archive, but I'm not sure it includes any kind of compression. The main use case for compression I can think of is compressing the output of Maps to be sent to Reduces (save on network I/O).

Oh, to answer your question more complete, you'd probably need to implement your own RecordReader and/or InputFormat to make sure the entire file got read by a single Map task, and also it used the correct decompression filter.

0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...