Oozie

Hadoop job fails, Resource Manager doesnt recognize AttemptID

谁都会走 提交于 2019-12-08 19:03:10
问题 Im trying to aggregate some data in an Oozie workflow. However the aggregation step fails. I found two points of interests in the logs: The first is an error(?) that seems to occur repeatedly: After a container finishes, it gets killed but exits with non-zero Exit code 143. It finishes: 2015-05-04 15:35:12,013 INFO [IPC Server handler 7 on 49697] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1430730089455_0009_m_000048_0 is : 0.7231312 2015-05-04 15:35:12

Oozie shell action not running as submitting user

☆樱花仙子☆ 提交于 2019-12-08 17:19:02
问题 I've written an Oozie workflow that runs a BASH shell script to do some hive queries and perform some actions on the results. The script runs but throws a permission error when accessing some of the HDFS data. The user that submitted the Oozie workflow has permission but the script is running as the yarn user. Is it possible to make Oozie execute the script as the user who submitted the workflow? Hive and Java actions both execute as the submitted user, just shell is behaving differently.

Exporting jobs listed in Oozie Web Console

こ雲淡風輕ζ 提交于 2019-12-08 13:31:19
问题 Apologies if this question sounds basic, I'm totally new to Hadoop environment. What am I looking for? In my case, there are jobs scheduled to run everday and I would want to export the list of failed jobs in an excel sheet each day . How do I view the workflow jobs? Currently I use the Oozie web console to view the jobs and I don't have/see an option to export. Also, I was not able to find this information from the Oozie documentation. However, I found that jobs can be listed using commands

Running multiple mapreduce jobs with oozie?

情到浓时终转凉″ 提交于 2019-12-08 13:16:49
问题 As part of a workaround, I wanted to use two mapreduce jobs(instead of one) that ought to run in sequence for giving the desired affect. The map function in each job simply emit each key,value pair without processing. The reduce functions in each job are different as they do different kind of processing. I stumbled upon oozie and it seem to directly writes to the input stream of the consequent job (or doesn't it?) - this would be great since the intermediate data is large (I/O operation would

RM job was stuck when running with oozie

♀尐吖头ヾ 提交于 2019-12-08 12:24:56
问题 I'm running a mapreduce wordcount job task on oozie. 2 jobs were submitted to the yarn, and then the monitoring tasks running upto 99% were stuck. Wordcount job has been 0%. When I kill off the monitor job, wordcount job runs smoothly. I use a cluster of 3 virtual machines, configuration is as follows: Profile per VM: cores=2 memory=2048MB reserved=0GB usableMem=0GB disks=1 Num Container=3 Container Ram=640MB Used Ram=1GB Unused Ram=0GB yarn.scheduler.minimum-allocation-mb=640 yarn.scheduler

OOZIE : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]

邮差的信 提交于 2019-12-08 11:19:27
问题 I'm trying to execute Oozie job with the help of URL: https://www.safaribooksonline.com/library/view/apache-oozie/9781449369910/ch05.html While executing oozie job -run -config target/example/job.properties Getting error as : Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 1 sec. Retry count = 1 Connection exception has occurred [ java.net.ConnectException Connection refused (Connection refused) ]. Trying after 2 sec. Retry

No function is mapped to the name “coord:formatTime”

旧时模样 提交于 2019-12-08 08:06:06
问题 I am trying to get the current timestamp using the below in oozie: <property> <name>date</name> <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1, 'DAY'), "yyyy-MM-dd")} </value> </property> My hive action is: <script>/abc/test.hql</script> <param>DATE=${date}</param> My Hive action fails saying: EL_ERROR No function is mapped to the name "coord:formatTime" Any idea why it says so?I want my date as YYYY-MM-DD HH-MM-SS 回答1: ${coord:formatTime(coord:dateOffset(coord:nominalTime

Error: E0902: Exception occured: [User: Root is not allowed to impersonate root

梦想与她 提交于 2019-12-08 06:35:49
问题 I am trying to follow the steps given at http://www.rohitmenon.com/index.php/apache-oozie-installation/ Note: I am not using cloudera distibution of hadoop The above link is similar to http://oozie.apache.org/docs/4.0.1/DG_QuickStart.html but with more descriptive seems to me however while running the below command as a root user i am getting exception ./bin/oozie-setup.sh sharelib create -fs Note: i have two live node shown at dfshealth.jsp . and i have updated the core-site.xml for all

Oozie > what is the difference between asynchronous actions and synchronous actions

末鹿安然 提交于 2019-12-08 02:23:44
问题 I read from Oozie official site: Actions Are Asynchronous All computation/processing tasks triggered by an action node are executed asynchronously by Oozie. For most types of computation/processing tasks triggered by workflow action, the workflow job has to wait until the computation/processing task completes before transitioning to the following node in the workflow . Whereas on different page of the same site: Fs HDFS action The introduction of FS action (synchronous action) told that: The

Optimize multiple Hive QL in Oozie

本秂侑毒 提交于 2019-12-08 01:14:25
问题 I am not familiar enough with hive, so here I am. We are using Oozie to chain a bunch of hive ql jobs together. I was tasked to optimize an application that has already been running in our production environment. The Business Partners don't want it to take longer than like 1.5 hours. One of the first things I noticed was that there are around 90 oozie actions within this one work flow. We also share a yarn queue with other applications. Half of those actions are hive2 actions, and each of the