问题
I am ope-rationalizing a data import process that takes data from an existing database and partitions it within a scheme of HDFS. By default, the job is split into four map processes, and right now I have the job configured to do this on a daily interval through Apache Oozie.
Since Oozie is DAG oriented, is there the capacity to create a validationStep within the Oozie workflow such that:
- Run HIVE query on newly imported data to return count of rows
- Run SQL query to return count of rows in original source of data
- Compare the two values
- If not match, return FAIL and KILL JOB, if match, return TRUE and OK
I understand there is a validate process within sqoop, but it is my understanding that since I am not running this against a single table that this is not applicable (each of my sqoop import is partitioned by a specific date).
Is this possible?
来源:https://stackoverflow.com/questions/25432612/validate-a-sqoop-with-use-of-query-and-where-clauses