Hive query execution for custom udf is exepecting hdfs jar path instead of local path in CDH4 with Oozie flow

浪子不回头ぞ 提交于 2019-12-13 04:53:33

问题


We are migrating from CDH3 to CDH4 and as part of this migration we are moving all the jobs that we have on CDH3. We have noticed one critical issue in this, when a work flow is executed through oozie for executing a python script which internally invoked a hive query(hive -e {query}), here in this hive query we are adding a custom jar using add jar {LOCAL PATH FOR JAR}, and created a temporary function for custom udf. And it looks ok till here. But when the query started executing with custom udf funtion it is failing with Distributed cache, File Not Found Exception which is looking for jar in the HDFS path instead of lookig in local path.

I am not sure if I am missing some configuration here.

Execption Trace:

WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/yarn/yarn_20131107020505_79b41443-b9f4-4d36-a0eb-4f0d79cd3ce9.log java.io.FileNotFoundException: File does not exist: hdfs://aa.bb.com:8020/opt/nfsmount/mypath/custom.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:824) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93) ..... .....

any help on this is highly appreciated.

Regards, GHK.


回答1:


There are some few options. All the required jar should be in the classpath before you run hive query.

option 1: Add your custom jar by <file>/hdfs/path/to/your/jar</file> in oozie workflow

option 2: use attribute --auxpath /local/path/to/your/jar while calling your hive script in python. Eg: hive --auxpath /local/path/to/your.jar -e {query}



来源:https://stackoverflow.com/questions/19852632/hive-query-execution-for-custom-udf-is-exepecting-hdfs-jar-path-instead-of-local

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!