Oozie | 易学教程

Using Oozie ssh action with private key

阅读更多关于 Using Oozie ssh action with private key

问题 I am trying to run a workflow in cloudera cluster using oozie ssh action . What I need is to run my scripts only from specific node. For this purposes I have found next solution - oozie ssh action . Configuring this workflow, I have faced with a problem that in configs, oozie takes only "user" and "host name" while I also need to use private ssh key for ssh connection. Is it possible to perform oozie ssh action with private key? Or maybe there are some other variants, how to run oozie

Job via Oozie HDP 2.1 not creating job.splitmetainfo

阅读更多关于 Job via Oozie HDP 2.1 not creating job.splitmetainfo

问题 When trying to execute a sqoop job which has my Hadoop program passed as a jar file in -jarFiles parameter, the execution blows off with below error. Any resolution seems to be not available. Other jobs with same Hadoop user is getting executed successfully. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.FileNotFoundException: File does not exist: hdfs://sandbox.hortonworks.com:8020/user/root/.staging/job_1423050964699_0003/job.splitmetainfo at org.apache.hadoop.mapreduce.v2

Job via Oozie HDP 2.1 not creating job.splitmetainfo

阅读更多关于 Job via Oozie HDP 2.1 not creating job.splitmetainfo

Is there a way to use config-default.xml globally in Oozie?

阅读更多关于 Is there a way to use config-default.xml globally in Oozie?

问题 From the documentation, config-default.xml must be presented in the workflow workspace. - /workflow.xml - /config-default.xml | - /lib/ (*.jar;*.so) The problem I've created a custom Oozie action and try to add default values for retry-max and retry-interval to all the custom actions. So my workflow.xml will look like this: <workflow-app xmlns="uri:oozie:workflow:0.3" name="wf-name"> <action name="custom-action" retry-max="${default_retry_max}" retry-interval="${default_retry_interval}"> <

How to enable/setup log4j for oozi java workflows?

阅读更多关于 How to enable/setup log4j for oozi java workflows?

问题 I'm running an Oozie Java workflow (the jar file is in HDFS), and I'd like to add logging functionality to my application. Does anybody know how to do it? Where should I put my "log4j.properties" file? How can I make log4j to output the log to a location in HDFS? 回答1: Looking in this documentation, you can try adding oozie-log4j.properties in your oozie directory (where workflow.xml is). Here are the default settings: log4j.appender.oozie=org.apache.log4j.rolling.RollingFileAppender log4j

How does one check completion status of LoadApplicationService?

阅读更多关于 How does one check completion status of LoadApplicationService?

问题 I have 2 action nodes in workflow : javaMainAction and javaMainAction2 . My LoadApplicationService method returns SUCCESS or FAILURE after execution. How to check response if SUCCESS is returned? workflow.xml : <workflow-app name="WorkflowJavaMainAction" xmlns="uri:oozie:workflow:0.1"> <start to="javaMainAction" /> <action name="javaMainAction"> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name>

Can Apache Oozie run docker containers?

阅读更多关于 Can Apache Oozie run docker containers?

问题 Currently comparing DAG-based workflow tools like Airflow and Luigi for scheduling generic docker containers as well as Spark jobs. Can Apache Oozie run generic Docker containers through its shell action? Or is Oozie strictly meant for Hadoop tools like Pig and Hive? Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as

Oozie Hive action hangs and heart beats forever

阅读更多关于 Oozie Hive action hangs and heart beats forever

问题 I am attempting to run a Hive action through an Oozie workflow that I've created in Hue, but the action "heart beat"s forever and does not execute the Hive SQL. I've read other posts about heart beating forever, but this one seems to be occurring at a different point, after the SQL statement has been parsed. I've checked memory on each node in the cluster, and I've verified that the task count parameters are reasonable. Here is the hive-config.xml file: <configuration> <property> <name>javax

Suggestion for scheduling tool(s) for building hadoop based data pipelines

阅读更多关于 Suggestion for scheduling tool(s) for building hadoop based data pipelines

问题 Between Apache Oozie, Spotify/Luigi and airbnb/airflow, what are the pros and cons for each of them? I have used oozie and airflow in the past for building a data ingestion pipeline using PIG and Hive. Currently, I am in the process of building a pipeline that looks at logs and extracts out useful events and puts them on redshift. I found that airflow was much easier to use/test/setup. It has a much cooler UI and lets users perform actions from the UI itself, which is not the case with Oozie.

How external clients notify Oozie workflow with HTTP callback

阅读更多关于 How external clients notify Oozie workflow with HTTP callback

问题 Let us say we have a case where an Oozie workflow is started with 3 Java action nodes. Each Java action is going to make an async HTTP call to an external web services (such as some web service exposed by google.com, yahoo.com, etc.) outside the Oozie/Hadoop cluster. I assume this is doable since Oozie support custom action node. Now, I don't want to have Oozie poll the external web services from time to time to check if the work is done in external web service. I want to have the external