How to call a hive UDF written in Java using Pyspark from Hive Context

你说的曾经没有我的故事 提交于 2019-12-23 23:16:12

问题


I use getLastProcessedVal2 UDF in hive to get the latest partitions from table. This UDF is written in java . I would like to use the same UDF from pyspark using hive context.

dfsql_sel_nxt_batch_id_ini=sqlContext.sql(''' select l4_xxxx_seee.**getLastProcessedVal2**("/data/l4/work/hive/l4__stge/proctl_stg","APP_AMLMKTE_L1","L1_AMLMKT_MDWE","TRE_EXTION","2.1")''')

Error:

ERROR exec.FunctionRegistry: Unable to load UDF class: java.lang.ClassNotFoundException:


回答1:


start your pyspark shell as:

pyspark --jars /path/to.udf.jar <all-other-param>

OR

submit your pyspark job with --jars option as:

spark-submit --jars /path/to/udf.jar <all-other-param>




回答2:


You could register that user defined function using SQLContext method udf, there you can see that you have to pass a string as the first parameter and it will represent the name of your udf while using SQL queries.

e.g.

sqlContext.udf().register("slen",
       (String arg1) -> arg1.length(),
       DataTypes.IntegerType);

sqlContext.sql("SELECT slen(name) FROM user").show();


来源:https://stackoverflow.com/questions/38491483/how-to-call-a-hive-udf-written-in-java-using-pyspark-from-hive-context

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!