Oozie

Oozie with Hadoop 2, Job hangs in “RUNNING”

允我心安 提交于 2019-12-01 23:29:54
I have workflow job with a java action node. Run with Hadoop 2.1.0.2.0.4.0-38 and Oozie 3.3.2.2.0.4.0 When I submit the job I see 2 lines in Hadoop Resource Manager screen. 1. with original job name 2. with Oozie job name. The task with Oozie job name is hanging in "RUNNING" state The task with original name is in "Accepted" state. All that I see in logs is: >>> Invoking Main class now >>> Heart beat Heart beat Heart beat Heart beat ... Thank you It seems that number of maptasks that can run in parallel are limited. Set the below property to a value higher than current value. mapred

How to check whether the file exist in HDFS location, using oozie?

醉酒当歌 提交于 2019-12-01 22:24:36
问题 How to check whether a file in HDFS location is exist or not, using Oozie? In my HDFS location I will get a file like this test_08_01_2016.csv at 11PM , on a daily basis. I want check whether this file exist after 11.15 PM. I can schedule the batch using a Oozie coordinator job. But how can I validate if the file exists in HDFS? 回答1: you can use EL expression in oozie like: <decision name="CheckFile"> <switch> <case to="nextOozieTask"> ${fs:exists('/path/test_08_01_2016.csv')} <!--do note the

How to schedule a sqoop action using oozie

£可爱£侵袭症+ 提交于 2019-12-01 20:12:45
I am new to Oozie, Just wondering - How do I schedule a sqoop job using Oozie. I know sqoop action can be added as part of the Oozie workflow. But how can I schedule a sqoop action and get it running like every 2 mins or 8pm every day automatically (just lie a cron job)? You need to create coordinator.xml file with start, end and frequency. Here is an example <coordinator-app name="example-coord" xmlns="uri:oozie:coordinator:0.2" frequency="${coord:days(7)}" start="${start}" end= "${end}" timezone="America/New_York"> <controls> <timeout>5</timeout> </controls> <action> <workflow> <app-path>$

org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120

风流意气都作罢 提交于 2019-12-01 17:42:49
I'm running a hadoop job ( from oozie ) that has few counters, and multioutput. I get error like: org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 121 max=120 Then I removed all the code that has counters, and also set mout.setCountersEnabled to false. And also set the max counters to 240 in hadoop config. Now I still get the same error org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters: 241 max=240 How can I solve this problem? Is there any possibility that any hidden counters exists? How can I make clear what counters there before

sqoop job shell script execute parallel in oozie

烈酒焚心 提交于 2019-12-01 13:56:01
I have a shell script which executes sqoop job . The script is below. !#/bin/bash table=$1 sqoop job --exec ${table} Now when I pass the table name in the workflow I get the sqoop job to be executed successfully. The workflow is below. <workflow-app name="Shell_script" xmlns="uri:oozie:workflow:0.5"> <start to="shell"/> <kill name="Kill"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <action name="shell_script"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>sqoopjob

Apache Oozie

最后都变了- 提交于 2019-12-01 06:44:40
1. Apache Oozie   Oozie是一个工作流调度系统。Oozie是运行于Java Servlet容器上的一个java web应用。Oozie是按照有向无权图(DAG)调度方式,使用xml文件配置工作流。最初是由Cloudear公司开发,后来贡献给Apache    a. apache Oozie架构 oozie的webapp:oozie的server   提供一个UI界面,接收客户点的提交的任务,提交给hadoop几圈,启动一个只有一个map没有reduce的mapreduce任务来调度工作的任务具体的执行,交给服务项。 b. 基本原理 采用xml配置工作流 xml:workflow.xml 配置工作流的具体执行(有向无环图实现到配置文件)。 job.properties:工作流的通用配置文件,执行参数信息。 将节点分为两类: 控制节点:描述工作流的走向 start end fork join kill 动作节点:具体执行任务的节点 包括不局限于:mr,java,hive,shell,spark等。 c. 工作流的类型 workFlow:顺序执行流程节点,普通的工作流的调度,不涉及定时,不涉及批处理的工作流。 coordinator:支持定时循环的调度任务 bundle:即支持定时又支持批处理。 来源: https://www.cnblogs.com/qidi/p

【Oozie-4.1.0】Oozie-4.1.0 + hadoop-2.7.1

扶醉桌前 提交于 2019-12-01 06:40:02
一、环境 maven-3.3.0 hadoop-2.7.1 二、编译 http://apache.mirrors.pair.com/oozie/4.1.0/oozie-4.1.0.tar.gz [root@hftclclw0001 opt]# pwd /opt [root@hftclclw0001 opt]# wget http://apache.mirrors.pair.com/oozie/4.1.0/oozie-4.1.0.tar.gz [root@hftclclw0001 opt]# tar -zxvf oozie-4.1.0.tar.gz [root@hftclclw0001 opt]# cd oozie-4.1.0 #默认 #sqoop.version=1.4.3 #hive.version=0.13.1 => 修改为其他,编译出错 #hbase.version=0.94.2 => 修改为其他,编译出错 #pig.version=0.12.1 #hadoop.version=2.3.0 => 最新版本是2.3.0 但是支持2.7.1 #tomcat.version=6.0.43 [root@hftclclw0001 opt]# ./bin/mkdistro.sh -DskipTests -Phadoop-2 -Dsqoop.version=1.4.6 ... ... ..

oozie VS azkaban

北城以北 提交于 2019-12-01 06:39:03
Oozie 可以从失败点重启,azkaban不能 Oozie 的 flow 保存在 DB ,而azkaban 保存在内存 Azkaban 在启动job前,必须确定execution路径,然而 Oozie 允许节点自己决定 Azkaban 不支持事件触发 Azkaban 使用简单的工作流 来源: oschina 链接: https://my.oschina.net/u/1421929/blog/656991

Oozie安装总结

99封情书 提交于 2019-12-01 06:38:51
权声明:本文为博主原创文章,未经博主允许不得转载。 一、使用CM添加服务的方式安装Oozie 如果在创建Oozie数据库时失败,且提示数据库已存在,如下图,则可能是之前已经安装过Oozie,没有卸载干净,需要手动将Oozie服务器数据目录删掉(oozie/data部分),见图二 (图一) (图二) 二、安装成功后,可以通过web方式访问Oozie(http://hadoop1:11000.oozie/),但此时出现如下页面,提示需要Ext JS,查看官方文档可知,只能使用Ext js 2.2版本(版权问题) 将ext-2.2.zip放到libext目录下:/opt/cloudera/parcels/CDH/lib /oozie/libext,并解压,unzip ext-2.2.zip,将解压后的ext-2.2权限改为777:chmod -R 777 ext-2.2 此时不需要重启Oozie,只要刷新浏览器,即可看到oozie的web页面 三、修改Oozie的元数据库为mysql,默认为derby,不适合生产环境使用 1、CM配置见下图: 2、在对应的mysql服务器创建oozie数据库:create database oozie; 3、在安装Oozie的节点libserver目录下(/opt/cloudera/parcels/CDH/lib/oozie/libserver

Oozie s3 as job folder

Deadly 提交于 2019-12-01 01:50:38
Oozie is failing with following error when workflow.xml is provided from s3, But the same worked provided workflow.xml from HDFS. Same has worked with earlier versions of oozie, Is there anything changed from 4.3 version of oozie.? Env: HDP 3.1.0 Oozie 4.3.1 oozie.service.HadoopAccessorService.supported.filesystems=* Job.properties nameNode=hdfs://ambari-master-1a.xdata.com:8020 jobTracker=ambari-master-2a.xdata.com:8050 queue=default #OOZIE job details basepath=s3a://mybucket/test/oozie oozie.use.system.libpath=true oozie.wf.application.path=${basepath}/jobs/test-hive​ #(works with this