Tool/Ways to schedule Amazon's Elastic MapReduce jobs

…衆ロ難τιáo~ 提交于 2020-01-24 10:26:12

问题


I use EMR to create new instances and process the jobs and then shutdown instances.

My requirement is to schedule jobs in periodic fashion. One of the easy implementation can be to use quartz to trigger EMR jobs. But looking at longer run I am interested in using out of box mapreduce scheduling solution. My question is that is there any out of box scheduling feature provided by EMR or AWS-SDK, which i can use for my requirement? I can see there is scheduling in Auto scaling, but i want to schedule EMR jobflow instead.


回答1:


There is Apache Oozie Workflow Scheduler for Hadoop to do just that.

Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.

Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

Oozie is a scalable, reliable and extensible system.

Here is a simple example of Elastic Map Reduce bootstrap actions for configuring apache oozie : https://github.com/lila/emr-oozie-sample

But to let you know oozie is a bit complicated and if and only if you have a lot of jobs to be scheduled/monitored/maintained then only you shall go for oozie or else just create a bunch of cron jobs if you have say just 2 or 3 jobs to be scheduled periodically.

You may also look into and explore simple workflow from Amazon.



来源:https://stackoverflow.com/questions/14014486/tool-ways-to-schedule-amazons-elastic-mapreduce-jobs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!