I am new to hadoop and trying to process wikipedia dump. It\'s a 6.7 GB gzip compressed xml file. I read that hadoop supports gzip compressed files but can only be processed
GZIP files cannot be partitioned in any way, due to a limitation of the codec. 6.7GB really isn't that big, so just decompress it on a single machine (it will take less than an hour) and copy the XML up to HDFS. Then you can process the Wikipedia XML in Hadoop.
Cloud9 contains a WikipediaPageInputFormat class that you can use to read the XML in Hadoop.