Join 2 tables in Hive using a phone number and a prefix (variable length)

♀尐吖头ヾ 提交于 2019-12-13 01:34:36

问题


I'm trying to match phone numbers to an area using Hive. I've got a table (prefmap) that maps a number prefix (prefix) to an area (area) and another table (users) with a list of phone numbers (nb). There is only 1 match per phone number (no sub-area)

The problem is that the length of the prefixes is not fixed so I cannot use the UDF function substr(nb,"prefix's length") in the JOIN's ON() condition to match the substring of a number to a prefix.

And when I try to use instr() to find if a number has a matching prefix:

SELECT users.nb,prefix.area
FROM users 
LEFT OUTER JOIN prefix 
ON (instr(prefmap.prefix,users.nb)=1)

I get an error on line4 "Both left and right aliases encountered in Join '1')

How could I get this to work? I'm using hive 0.9 Thanks for any advice.


回答1:


Probably not the best solution but at least it does the job: use WHERE to define the matching condition instead of ON() (that is now forced to TRUE)

select users.nb, prefix.area  
from users  
LEFT OUTER JOIN prefix  
ON(true)  
WHERE instr(users.nb,prefmap.prefix)=1  

It's not perfect as it's a bit slow. It creates as many temporary (useless) entries as there are in the matching table before the WHERE condition keeps the only right one. So it's better to use this only if it's not too long. Can anyone think of a better way to do this?




回答2:


hive cannot convert (instr(prefmap.prefix,users.nb)=1) to mapreduce job.

so hive's join just support equality expression. see hive joins wiki for more information.



来源:https://stackoverflow.com/questions/13017897/join-2-tables-in-hive-using-a-phone-number-and-a-prefix-variable-length

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!