Pipeling hadoop map reduce jobs

廉价感情. 提交于 2019-12-23 03:37:13

问题


I have five map reduce that I am running each separately. I want to pipeline them all together. So, output of one job goes to next job. Currently, I wrote shell script to execute them all. Is there a way to write this in java? Please provide an example.

Thanks


回答1:


You may find JobControl to be the simplest method for chaining these jobs together. For more complex workflows, I'd recommend checking out Oozie.




回答2:


Hi I had similar requirement One way to do this is

after submitting first job execute following

Job job1 = new Job( getConf() );
job.waitForCompletion( true );

and then check for status using

if(job.isSuccessful()){
    //start another job with different Mapper.
    //change config
    Job job2 = new Job( getConf() );
}



回答3:


Oozie is the solution for you. You can submit map-reduce types of jobs, hive jobs, pig jobs, system commands etc through Oozie's action tags.

It even has a co-ordinator which acts as a cron for your workflow.




回答4:


Another possibility is Cascading, which also provides an abstraction layer on top of Hadoop: itseems to provide a similar combination of working-closely-with-Hadoop-concepts yet letting-hadoop-do-the-M/R-heavy lifting that one gets using Oozie workflows calling Pig scripts.




回答5:


For your use case, I think Oozie will be good. Oozie is a workflow scheduler in which you can write different actions(can be map-reduce, java, shell, etc) to perform some compute, transformation, enrichment, etc. For this case :

action A : i/p input o/p a

action B : i/p a o/p b

action C : i/p b o/p c(final output)

You can finally persist c in HDFS, and can decide to persist or delete intermediate outputs.

If you want to do the computation done by all three actions in a single one then you can use Cascading. You can understand better about Cascading by their official documentation, and you can also refer my blog on same : https://tech.flipkart.com/expressing-etl-workflows-via-cascading-192eb5e7d85d



来源:https://stackoverflow.com/questions/3939979/pipeling-hadoop-map-reduce-jobs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!