Oozie

Why does the oozie launcher consume 2 yarn containers?

好久不见. 提交于 2019-12-25 08:19:25
问题 I am using Oozie to execute a spark job. I "kind-of" understand that oozie lunches a map-only mapreduce job and from there lunches the spark-job. What I do not understand is why this job consumes 2 yarn-containers? In Yarn's resource manager page (titled "All Applications") I see something like this: ID: application_nnnnn_3456 Name: oozie: launcher... Application Type: MAPREDUCE Running Containers: 2 ID: application_nnnnn_3457 Name: spark-app Application Type: SPARK Running Containers: 1 Is

Dynamically calculating oozie parameter (number of reducers for MR action)

落花浮王杯 提交于 2019-12-25 07:18:43
问题 In my oozie workflow I dynamically create a hive table, say T1. This hive action is then followed by a map-reduce action. I want to set number of reducers property (mapred.reduce.tasks) equal to distinct values of a field say (T1.group). Any ideas how to set value of some oozie parameter dynamically and how to get value of the parameter from hive distinct action to oozie parameter? 回答1: I hope this can help: Create the hive table as you are doing already. Execute another Hive query which

Install oozie sharelib

有些话、适合烂在心里 提交于 2019-12-25 02:25:22
问题 I want to install oozie sharelib to HDFS as part of test setup without building oozie package. I couldn't find oozie-sharelib.tar.gz in any of thre repository. Any idea if it can be done without downloading/building oozie? Thanks! 回答1: Oozie sharelib is bundled with oozie. You must download/install oozie. It will be in the $OOZIE_HOME folder. 回答2: Sharelib is required jar files for various operations under oozie environment, which get created with the successful build of oozie. it is possible

Scheduling an ad-hoc query with Hive/Hadoop using Oozie

こ雲淡風輕ζ 提交于 2019-12-24 20:36:04
问题 Does Oozie support a user scheduling, via a REST API, an ad-hoc Hive query? We're building a system where a user can search documents in Hadoop, with support for the user (optionally) specifying some attributes of the data to be searched, using Hive to perform the query against Hadoop. Because of this support for optional fields, we don't know ahead of time what the Hive query will look like (in terms of which tables will be used in the Hive query). We have a service where, at run-time, we

Shortening Oozie workflows

跟風遠走 提交于 2019-12-24 16:18:03
问题 I'm using Oozie to string together a set of MapReduce jobs. The individual stubs for each job are about 400 lines long due to requiring lots of properties. Most of these properties are identical between jobs, and use configuration set in config-default.xml. I want to be able to shorten each stub and centralise the common properties, as it's getting pretty impractical to have to work out which properties are common when creating a new job. The obvious solution is to shorten my workflows by

Move files in S3 using oozie

时间秒杀一切 提交于 2019-12-24 16:15:02
问题 I want to move files in S3 using AWS oozie. I want to run aws s3 mv s3://temp/*.zip s3://temp/processed_files/. --recursive How I can do this in oozie? EDIT 1 2015-11-12 10:18:55,758 WARN ShellActionExecutor:542 - USER[hadoop] GROUP[-] TOKEN[] APP[rad_workflow] JOB[0000118-151029144311676-oozie-oozi-W] ACTION[0000118-151029144311676-oozie-oozi-W@sh] Launcher exception: Cannot run program "move.sh" (in directory "/mnt1/yarn/usercache/hadoop/appcache/application_1446129655727_0421/container

impersonate oozie job - permission issue

ⅰ亾dé卋堺 提交于 2019-12-24 15:58:21
问题 I am trying to execute bash script, which contains multiple hive commands using ozzie and i get security exception (Permission denied: user=yarn, access=WRITE, inode="/user":hdfs:hdfs:drwxr-xr-x) Extra info: The Submit command was done using hdfs user. I have tried using impersonates option (-doas hdfs) Disabling the security check solve the problem but causing different problem (FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask), using: <property> <name

Oozie - Task Logs Do not Display

限于喜欢 提交于 2019-12-24 11:34:35
问题 Using CDH 5, when I run my oozie workflow I no longer see log-statements from my mappers (log4j, slf4j). I even tried System.out.println - I still don't see the statements. Is there a setting I'm missing? 回答1: It turned out that the logs are still there except you need to manually point your browser to it. For example, clicking on a map-reduce action still opens the job log page something like (http://localhost:50030/jobdetails.jsp?jobid=job_201510061631_2112). However to get the result for

adding multiple jars in Oozie-Spark action

本秂侑毒 提交于 2019-12-24 08:30:13
问题 I'm using HDP2.6. where is installed oozie 4.2. and Spark2. After I tracked Hortonworks guide on this site: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html for adding libs for Spark2 in 4.2. version of Oozie. After I submit the job with this add-on: oozie.action.sharelib.for.spark=spark2 The error I'm getting is this: 2017-07-19 12:36:53,271 WARN SparkActionExecutor:523 - SERVER[] USER[admin] GROUP[-] TOKEN[] APP[Workflow2]

Forcing Oozie job to run on specific node

夙愿已清 提交于 2019-12-24 06:45:12
问题 I have a 6 node cluster. When trying to run an oozie job, it triggers the job in any of the 6 nodes Is there a way to specify the node in which the oozie shell action should be triggered 回答1: You can use oozie's spark-action for this purpose. Refer: https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html 来源: https://stackoverflow.com/questions/38691830/forcing-oozie-job-to-run-on-specific-node