问题
Does Oozie support a user scheduling, via a REST API, an ad-hoc Hive query?
We're building a system where a user can search documents in Hadoop, with support for the user (optionally) specifying some attributes of the data to be searched, using Hive to perform the query against Hadoop. Because of this support for optional fields, we don't know ahead of time what the Hive query will look like (in terms of which tables will be used in the Hive query). We have a service where, at run-time, we process the user's query to generate the corresponding Hive query.
We'd like to be able to schedule these queries via Oozie, but I haven't been able to find documentation on how to perform this via Oozie. I assume this is possible. Is there sample Java code available to describe how to perform this operation?
回答1:
Use the Oozie Coordinator to schedule jobs, Apache documentation here and an example here for Oozie Coordinator. Also, take a look at Azkaban (1, 2) for scheduling.
回答2:
Proxy Hive Job Submission via the REST API allows users to submit jobs without creating a workflow XML on HDFS:
- https://oozie.apache.org/docs/5.1.0/WebServicesAPI.html#Proxy_Hive_Job_Submission
You can also use FluentAPI to programatically build workflows:
- https://oozie.apache.org/docs/5.1.0/DG_FluentJobAPI.html#A_More_Verbose_Example
- https://github.com/apache/oozie/blob/master/fluent-job/fluent-job-api/src/test/java/org/apache/oozie/fluentjob/api/action/TestHive2ActionBuilder.java
As mentioned above, Oozie Coordinator can be used to schedule & regularly execute workflows. Beyond time dependency, you can also define data dependencies (such as existence of specific files on HDFS) for starting a workflow.
来源:https://stackoverflow.com/questions/23275414/scheduling-an-ad-hoc-query-with-hive-hadoop-using-oozie