I\'m trying to play around with the Google ngrams dataset using Amazon\'s Elastic Map Reduce. There\'s a public dataset at http://aws.amazon.com/datasets/8172056142375670, a
You may want to look at this https://github.com/kevinweil/hadoop-lzo