How to use Hadoop Streaming with LZO-compressed Sequence Files?

后端 未结 4 1343
抹茶落季
抹茶落季 2021-01-13 05:20

I\'m trying to play around with the Google ngrams dataset using Amazon\'s Elastic Map Reduce. There\'s a public dataset at http://aws.amazon.com/datasets/8172056142375670, a

4条回答
  •  一个人的身影
    2021-01-13 05:51

    lzo is packaged as part of elastic mapreduce so there's no need to install anything.

    i just tried this and it works...

     hadoop jar ~hadoop/contrib/streaming/hadoop-streaming.jar \
      -D mapred.reduce.tasks=0 \
      -input s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-all/1gram/ \
      -inputformat SequenceFileAsTextInputFormat \
      -output test_output \
      -mapper org.apache.hadoop.mapred.lib.IdentityMapper
    

提交回复
热议问题