Oozie

Oozie 入门

拥有回忆 提交于 2019-11-30 13:34:30
1 Oozie 简介 一个基于工作流引擎的开源框架,提供对 Hadoop MapReduce、Pig Jobs 的任务调度与协调,主要用于定时调度任务,多任务可以按照执行的逻辑顺序调度。 2 功能模块 2.1 模块 1、Workflow 顺序执行流程节点,支持 fork(分支多个节点),join(合并多个节点为一个) 2、Coordinator 定时触发 workflow 3、Bundle 绑定多个 Coordinator 2.2 常用节点 控制流节点(Control Flow Nodes**)** 控制流节点一般都是定义在工作流开始或者结束的位置,比如start,end,kill 等,以及提供工作流的执行路径机制,如decision,fork,join 等。 动作节点(Action Nodes**)** 负责执行具体动作的节点,比如:拷贝文件,执行某个 Shell 脚本等等 3 安装部署 3.1 Hadoop 配置 core-site.xml <configuration> <!-- 指定HDFS中NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop101:8020</value> </property> <!-- 指定Hadoop运行时产生文件的存储目录 --> <property>

hue安装,hive hdfs oozie集成

六月ゝ 毕业季﹏ 提交于 2019-11-30 11:05:58
[toc] 环境需求 jdk maven git 第三方依赖 yum install -y gcc-c++ libxml2-devel.x86_64 libxslt-devel.x86_64 python-devel openldap-devel asciidoc cyrus-sasl-gssapi openssl-devel mysql-devel sqlite-devel gmp-devel libffi-devel npm 还有缺失什么可自行安装 创建hue用户和用户组 groupadd hue useradd -g hue hue 切换用户并进入hue根目录 su hue cd ~ 从git上拉取hue源码 git clone https://github.com/cloudera/hue.git 进入hue-master目录并编译 cd hue-master make apps 需要等待较长时间,若中间还缺少依赖可自行安装 修改hue配置文件 vi /home/hue/hue-master/desktop/conf/pseudo-distributed.ini 时区 [desktop] time_zone=Asia/Shanghai 用户角色 [desktop] server_user=hue server_group=hue default_user=hue

pass username and password to sqoop meta connect from oozie

谁都会走 提交于 2019-11-30 09:54:12
问题 <sqoop xmlns="uri:oozie:sqoop-action:0.3"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <arg>job</arg> <arg>--meta-connect</arg> <arg>jdbc:mysql://FQDN:3306/sqoop</arg> <arg>--exec</arg> <arg>fabric_inventory</arg> </sqoop> Now, to pass username and password for the --meta-connect here, if I pass it as following in oozie.xml: <arg>jdbc:mysql://FQDN:3306/sqoop?user=sqoop&password=sqoop</arg> or <arg>jdbc:mysql://FQDN:3306/sqoop?user=sqoop&password=sqoop</arg> It

Oozie和Azkaban的区别

丶灬走出姿态 提交于 2019-11-30 09:27:41
Oozie和Azkaban的区别: 工作流定义:Oozie是通过xml定义的而Azkaban为properties来定义。 部署过程:Oozie的部署相对困难些,同时它是从Yarn上拉任务日志。 Azkaban中如果有任务出现失败,只要进程有效执行,那么任务就算执行成功,这是BUG,但是Oozie能有效的检测任务的成功与失败。 操作工作流:Azkaban使用Web操作。Oozie支持Web,RestApi,Java API操作。 权限控制:Oozie基本无权限控制,Azkaban有较完善的权限控制,供用户对工作流读写执行操作。 Oozie的action主要运行在hadoop中而Azkaban的actions运行在Azkaban的服务器中。 记录workflow的状态:Azkaban将正在执行的workflow状态保存在内存中,Oozie将其保存在Mysql中。 出现失败的情况:Azkaban会丢失所有的工作流,但是Oozie可以在继续失败的工作流运行 来源: https://www.cnblogs.com/yumengfei/p/11576447.html

IOException: Filesystem closed exception when running oozie workflow

限于喜欢 提交于 2019-11-30 04:24:17
We are running a workflow in oozie. It contains two actions: the first is a map reduce job that generates files in the hdfs and the second is a job that should copy the data in the files to a database. Both parts are done successfully but oozie throws an exception at the end that marks it as a failed process. This is the exception: 2014-05-20 17:29:32,242 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:lpinsight (auth:SIMPLE) cause:java.io.IOException: Filesystem closed 2014-05-20 17:29:32,243 WARN org.apache.hadoop.mapred.Child: Error running child java.io

oozie job 的挂了监控报警或重启

放肆的年华 提交于 2019-11-30 01:39:01
oozie Coordinator 的job 和actioni状态很多,但好像不支持设置某状态如failed后30分钟后自动重新拉启,因他的条件只有几种:触发条件可以是一个时间频率、一个dataset实例是否可用,或者可能是外部的其他事件。而input-events和output-events好像又支持DATASET,可以想想下在wf的error里写个文件作为failed标识,在使其成为每个动作开始的判断条件dataset,后在重新执行该动作,动作之前 还要在清理丢此falg,如此复杂 不说,且这个wf不一定能写出来,或写出来还是个死循环,因正常JOB挂了一定是当时集群资源或网络有问题,要KILL掉整个JOB,后一段时间在重新执行(当然是你的wf逻辑要支持重复执行,而不执行时间和频繁的影响),而报警应该可以eroor to kill 里加个脚本发个email或使用email action啥得,具体参:https://www.cnblogs.com/wind-xwj/p/8946760.html input-events和output-events元素 一个Coordinator应用的输入事件指定了要执行一个Coordinator动作必须满足的输入条件,在Oozie当前版本,只支持使用dataset实例。 一个Coordinator动作可能会生成一个或多个dataset实例

Oozie > Java action > why property oozie.launcher.mapred.child.java.opts does not work

风流意气都作罢 提交于 2019-11-29 16:15:37
I am working on Oozie with a Java action. The Java action should use Java option -Xmx15g. Accordingly I set the property oozie.mapreduce.map.memory.mb to 25600 (25G) in case some extra memory is needed. After this simple setting, I ran the Oozie job, then there was of course OutofMemory (heap out of space) error during Java runtime. So I set oozie.launcher.mapred.child.java.opts as -Xmx15g accordingly in the property node of the Java action based on the link: http://downright-amazed.blogspot.fi/2012/02/configure-oozies-launcher-job.html . But I still got the same OutofMemory error. Then I

sqoop fails to store incremental state to the metastore

与世无争的帅哥 提交于 2019-11-29 15:51:56
I get this on saving incremental import state 16/05/15 21:43:05 INFO tool.ImportTool: Saving incremental import state to the metastore 16/05/15 21:43:56 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Error communicating with database at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.createInternal(HsqldbJobStorage.java:426) at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.update(HsqldbJobStorage.java:445) at org.apache.sqoop.tool.ImportTool.saveIncrementalState(ImportTool.java:164) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java

Oozie Workflow failed due to error JA017

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-29 12:05:03
I am using the version of Apache Oozie 4.3.0 along with Hadoop 2.7.3 I have developed a very simple Oozie workflow, which simply has a sqoop action to export system events to a MySQL table. <workflow-app name="WorkflowWithSqoopAction" xmlns="uri:oozie:workflow:0.1"> <start to="sqoopAction"/> <action name="sqoopAction"> <sqoop xmlns="uri:oozie:sqoop-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <command>export --connect jdbc:mysql://localhost/airawat --username devUser --password myPwd --table eventsgranularreport --direct --enclosed-by '\"' --export

Override hadoop's mapreduce.fileoutputcommitter.marksuccessfuljobs in oozie

这一生的挚爱 提交于 2019-11-29 11:11:38
<property> <name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name> <value>false</value> </property> I want to override the above property to true. The property needs to be false for the rest of the jobs on the cluster, but I need, in my oozie workflow, hadoop to create _SUCCESS file in the output directory after the completion of job. Its a hive action in the workflow which writes output. Please help. Matthew Rathbone Hive unfortunately overrides this capability by setting it's own NullOutputComitter: conf.setOutputCommitter(NullOutputCommitter.class); see src/shims/src/0.20/java/org