joining two tables on a udf in hive

放肆的年华 提交于 2020-02-01 05:37:26

问题


A basic question before i write a udf to be used in hive. I want to join two tables based on custom UDF which takes an argument from table a and another from table b. I have seen examples of UDFs which take arguments from one of the tables to be joined. Does taking arguments from two tables work equally well?.


回答1:


It sounds like you want a function

function my_udf(val_A, val_B):
    trans_A = <do something to val_A>
    trans_B = <do something to val_B>
    return trans_A cmp trans_B

The UDF will return a boolean, which you can use in an ON clause.

I'm not sure you can do this directly in Hive, but you can always use two UDFs to transform val_A to trans_A and val_B to trans_B then use a normal ON:

select *
from
    (select *, udf_A(some_column) as trans_A from A) as AA
    JOIN
    (select *, udf_B(some_column) as trans_B from B) as BB on AA.trans_A = BB.trans_B


来源:https://stackoverflow.com/questions/18039235/joining-two-tables-on-a-udf-in-hive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!