I have five map reduce that I am running each separately. I want to pipeline them all together. So, output of one job goes to next job. Currently, I wrote shell script to execute them all. Is there a way to write this in java? Please provide an example.
Thanks
You may find JobControl to be the simplest method for chaining these jobs together. For more complex workflows, I'd recommend checking out Oozie.
Hi I had similar requirement One way to do this is
after submitting first job execute following
Job job1 = new Job( getConf() );
job.waitForCompletion( true );
and then check for status using
if(job.isSuccessful()){
//start another job with different Mapper.
//change config
Job job2 = new Job( getConf() );
}
Oozie is the solution for you. You can submit map-reduce types of jobs, hive jobs, pig jobs, system commands etc through Oozie's action tags.
It even has a co-ordinator which acts as a cron for your workflow.
Another possibility is Cascading, which also provides an abstraction layer on top of Hadoop: itseems to provide a similar combination of working-closely-with-Hadoop-concepts yet letting-hadoop-do-the-M/R-heavy lifting that one gets using Oozie workflows calling Pig scripts.
For your use case, I think Oozie will be good. Oozie is a workflow scheduler in which you can write different actions(can be map-reduce, java, shell, etc) to perform some compute, transformation, enrichment, etc. For this case :
action A : i/p input o/p a
action B : i/p a o/p b
action C : i/p b o/p c(final output)
You can finally persist c in HDFS, and can decide to persist or delete intermediate outputs.
If you want to do the computation done by all three actions in a single one then you can use Cascading. You can understand better about Cascading by their official documentation, and you can also refer my blog on same : https://tech.flipkart.com/expressing-etl-workflows-via-cascading-192eb5e7d85d
来源:https://stackoverflow.com/questions/3939979/pipeling-hadoop-map-reduce-jobs