Oozie | 易学教程

Oozie 入门

阅读更多关于 Oozie 入门

1 Oozie 简介一个基于工作流引擎的开源框架，提供对 Hadoop MapReduce、Pig Jobs 的任务调度与协调，主要用于定时调度任务，多任务可以按照执行的逻辑顺序调度。 2 功能模块 2.1 模块 1、Workflow 顺序执行流程节点，支持 fork（分支多个节点），join（合并多个节点为一个） 2、Coordinator 定时触发 workflow 3、Bundle 绑定多个 Coordinator 2.2 常用节点控制流节点（Control Flow Nodes**）** 控制流节点一般都是定义在工作流开始或者结束的位置，比如start,end,kill 等，以及提供工作流的执行路径机制，如decision，fork，join 等。动作节点（Action Nodes**）** 负责执行具体动作的节点，比如：拷贝文件，执行某个 Shell 脚本等等 3 安装部署 3.1 Hadoop 配置 core-site.xml <configuration>  <property> <name>fs.defaultFS</name> <value>hdfs://hadoop101:8020</value> </property>  <property>

hue安装，hive hdfs oozie集成

阅读更多关于 hue安装，hive hdfs oozie集成

[toc] 环境需求 jdk maven git 第三方依赖 yum install -y gcc-c++ libxml2-devel.x86_64 libxslt-devel.x86_64 python-devel openldap-devel asciidoc cyrus-sasl-gssapi openssl-devel mysql-devel sqlite-devel gmp-devel libffi-devel npm 还有缺失什么可自行安装创建hue用户和用户组 groupadd hue useradd -g hue hue 切换用户并进入hue根目录 su hue cd ~ 从git上拉取hue源码 git clone https://github.com/cloudera/hue.git 进入hue-master目录并编译 cd hue-master make apps 需要等待较长时间，若中间还缺少依赖可自行安装修改hue配置文件 vi /home/hue/hue-master/desktop/conf/pseudo-distributed.ini 时区 [desktop] time_zone=Asia/Shanghai 用户角色 [desktop] server_user=hue server_group=hue default_user=hue

pass username and password to sqoop meta connect from oozie

阅读更多关于 pass username and password to sqoop meta connect from oozie

问题 <sqoop xmlns="uri:oozie:sqoop-action:0.3"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <arg>job</arg> <arg>--meta-connect</arg> <arg>jdbc:mysql://FQDN:3306/sqoop</arg> <arg>--exec</arg> <arg>fabric_inventory</arg> </sqoop> Now, to pass username and password for the --meta-connect here, if I pass it as following in oozie.xml: <arg>jdbc:mysql://FQDN:3306/sqoop?user=sqoop&password=sqoop</arg> or <arg>jdbc:mysql://FQDN:3306/sqoop?user=sqoop&password=sqoop</arg> It

Oozie和Azkaban的区别

阅读更多关于 Oozie和Azkaban的区别

Oozie和Azkaban的区别：工作流定义：Oozie是通过xml定义的而Azkaban为properties来定义。部署过程：Oozie的部署相对困难些，同时它是从Yarn上拉任务日志。 Azkaban中如果有任务出现失败，只要进程有效执行，那么任务就算执行成功，这是BUG，但是Oozie能有效的检测任务的成功与失败。操作工作流：Azkaban使用Web操作。Oozie支持Web，RestApi，Java API操作。权限控制：Oozie基本无权限控制，Azkaban有较完善的权限控制，供用户对工作流读写执行操作。 Oozie的action主要运行在hadoop中而Azkaban的actions运行在Azkaban的服务器中。记录workflow的状态：Azkaban将正在执行的workflow状态保存在内存中，Oozie将其保存在Mysql中。出现失败的情况：Azkaban会丢失所有的工作流，但是Oozie可以在继续失败的工作流运行来源： https://www.cnblogs.com/yumengfei/p/11576447.html

IOException: Filesystem closed exception when running oozie workflow

阅读更多关于 IOException: Filesystem closed exception when running oozie workflow

We are running a workflow in oozie. It contains two actions: the first is a map reduce job that generates files in the hdfs and the second is a job that should copy the data in the files to a database. Both parts are done successfully but oozie throws an exception at the end that marks it as a failed process. This is the exception: 2014-05-20 17:29:32,242 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:lpinsight (auth:SIMPLE) cause:java.io.IOException: Filesystem closed 2014-05-20 17:29:32,243 WARN org.apache.hadoop.mapred.Child: Error running child java.io

oozie job 的挂了监控报警或重启

阅读更多关于 oozie job 的挂了监控报警或重启

oozie Coordinator 的job 和actioni状态很多，但好像不支持设置某状态如failed后30分钟后自动重新拉启，因他的条件只有几种:触发条件可以是一个时间频率、一个dataset实例是否可用，或者可能是外部的其他事件。而input-events和output-events好像又支持DATASET，可以想想下在wf的error里写个文件作为failed标识，在使其成为每个动作开始的判断条件dataset，后在重新执行该动作，动作之前还要在清理丢此falg，如此复杂不说，且这个wf不一定能写出来，或写出来还是个死循环，因正常JOB挂了一定是当时集群资源或网络有问题，要KILL掉整个JOB，后一段时间在重新执行（当然是你的wf逻辑要支持重复执行，而不执行时间和频繁的影响），而报警应该可以eroor to kill 里加个脚本发个email或使用email action啥得,具体参：https://www.cnblogs.com/wind-xwj/p/8946760.html input-events和output-events元素一个Coordinator应用的输入事件指定了要执行一个Coordinator动作必须满足的输入条件，在Oozie当前版本，只支持使用dataset实例。一个Coordinator动作可能会生成一个或多个dataset实例

Oozie > Java action > why property oozie.launcher.mapred.child.java.opts does not work

阅读更多关于 Oozie > Java action > why property oozie.launcher.mapred.child.java.opts does not work

I am working on Oozie with a Java action. The Java action should use Java option -Xmx15g. Accordingly I set the property oozie.mapreduce.map.memory.mb to 25600 (25G) in case some extra memory is needed. After this simple setting, I ran the Oozie job, then there was of course OutofMemory (heap out of space) error during Java runtime. So I set oozie.launcher.mapred.child.java.opts as -Xmx15g accordingly in the property node of the Java action based on the link: http://downright-amazed.blogspot.fi/2012/02/configure-oozies-launcher-job.html . But I still got the same OutofMemory error. Then I

sqoop fails to store incremental state to the metastore

阅读更多关于 sqoop fails to store incremental state to the metastore

I get this on saving incremental import state 16/05/15 21:43:05 INFO tool.ImportTool: Saving incremental import state to the metastore 16/05/15 21:43:56 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Error communicating with database at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:426) at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.update(HsqldbJobStorage.java:445) at org.apache.sqoop.tool.ImportTool.saveIncrementalState(ImportTool.java:164) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java

Oozie Workflow failed due to error JA017

阅读更多关于 Oozie Workflow failed due to error JA017

I am using the version of Apache Oozie 4.3.0 along with Hadoop 2.7.3 I have developed a very simple Oozie workflow, which simply has a sqoop action to export system events to a MySQL table. <workflow-app name="WorkflowWithSqoopAction" xmlns="uri:oozie:workflow:0.1"> <start to="sqoopAction"/> <action name="sqoopAction"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <command>export --connect jdbc:mysql://localhost/airawat --username devUser --password myPwd --table eventsgranularreport --direct --enclosed-by '\"' --export

Override hadoop's mapreduce.fileoutputcommitter.marksuccessfuljobs in oozie

阅读更多关于 Override hadoop's mapreduce.fileoutputcommitter.marksuccessfuljobs in oozie

<property> <name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name> <value>false</value> </property> I want to override the above property to true. The property needs to be false for the rest of the jobs on the cluster, but I need, in my oozie workflow, hadoop to create _SUCCESS file in the output directory after the completion of job. Its a hive action in the workflow which writes output. Please help. Matthew Rathbone Hive unfortunately overrides this capability by setting it's own NullOutputComitter: conf.setOutputCommitter(NullOutputCommitter.class); see src/shims/src/0.20/java/org

订阅 Oozie