Chaining multiple mapreduce tasks in Hadoop streaming

前端 未结 4 712
余生分开走
余生分开走 2020-12-15 12:45

I am in scenario where I have two mapreduce jobs. I am more comfortable with python and planning to use it for writing mapreduce scripts and use hadoop streaming for the sam

4条回答
  •  攒了一身酷
    2020-12-15 13:15

    Here is a great blog post on how to use Cascading and Streaming. http://www.xcombinator.com/2009/11/18/how-to-use-cascading-with-hadoop-streaming/

    The value here is you can mix java (Cascading query flows) with your custom streaming operations in the same app. I find this much less brittle than other methods.

    Note, the Cascade object in Cascading allows you to chain multiple Flows (via the above blog post your Streaming job would become a MapReduceFlow).

    Disclaimer: I'm the author of Cascading

提交回复
热议问题