I am new to hadoop and trying to process wikipedia dump. It\'s a 6.7 GB gzip compressed xml file. I read that hadoop supports gzip compressed files but can only be processed
Why not ungzip it and use Splittable LZ compression instead?m
http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/