问题
We are migrating from CDH3 to CDH4 and as part of this migration we are moving all the jobs that we have on CDH3. We have noticed one critical issue in this, when a work flow is executed through oozie for executing a python script which internally invoked a hive query(hive -e {query}), here in this hive query we are adding a custom jar using add jar {LOCAL PATH FOR JAR}, and created a temporary function for custom udf. And it looks ok till here. But when the query started executing with custom udf funtion it is failing with Distributed cache, File Not Found Exception which is looking for jar in the HDFS path instead of lookig in local path.
I am not sure if I am missing some configuration here.
Execption Trace:
WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Execution log at: /tmp/yarn/yarn_20131107020505_79b41443-b9f4-4d36-a0eb-4f0d79cd3ce9.log java.io.FileNotFoundException: File does not exist: hdfs://aa.bb.com:8020/opt/nfsmount/mypath/custom.jar at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:824) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224) at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93) ..... .....
any help on this is highly appreciated.
Regards, GHK.
回答1:
There are some few options. All the required jar should be in the classpath before you run hive query.
option 1: Add your custom jar by <file>/hdfs/path/to/your/jar</file>
in oozie workflow
option 2: use attribute --auxpath /local/path/to/your/jar
while calling your hive script in python. Eg: hive --auxpath /local/path/to/your.jar -e {query}
来源:https://stackoverflow.com/questions/19852632/hive-query-execution-for-custom-udf-is-exepecting-hdfs-jar-path-instead-of-local