I recently set up LZO compression in Hadoop. What is the easiest way to compress a file in HDFS? I want to compress a file and then delete the original. Should I create a
Well, if you compress a single file, you may save some space, but you can't really use Hadoop's power to process that file since the decompression has to be done by a single Map task sequentially. If you have lots of files, there's Hadoop Archive, but I'm not sure it includes any kind of compression. The main use case for compression I can think of is compressing the output of Maps to be sent to Reduces (save on network I/O).
Oh, to answer your question more complete, you'd probably need to implement your own RecordReader and/or InputFormat to make sure the entire file got read by a single Map task, and also it used the correct decompression filter.