问题
I'm trying to match phone numbers to an area using Hive. I've got a table (prefmap) that maps a number prefix (prefix) to an area (area) and another table (users) with a list of phone numbers (nb). There is only 1 match per phone number (no sub-area)
The problem is that the length of the prefixes is not fixed so I cannot use the UDF function substr(nb,"prefix's length") in the JOIN's ON() condition to match the substring of a number to a prefix.
And when I try to use instr() to find if a number has a matching prefix:
SELECT users.nb,prefix.area
FROM users
LEFT OUTER JOIN prefix
ON (instr(prefmap.prefix,users.nb)=1)
I get an error on line4 "Both left and right aliases encountered in Join '1')
How could I get this to work? I'm using hive 0.9 Thanks for any advice.
回答1:
Probably not the best solution but at least it does the job: use WHERE to define the matching condition instead of ON() (that is now forced to TRUE)
select users.nb, prefix.area
from users
LEFT OUTER JOIN prefix
ON(true)
WHERE instr(users.nb,prefmap.prefix)=1
It's not perfect as it's a bit slow. It creates as many temporary (useless) entries as there are in the matching table before the WHERE condition keeps the only right one. So it's better to use this only if it's not too long. Can anyone think of a better way to do this?
回答2:
hive cannot convert (instr(prefmap.prefix,users.nb)=1)
to mapreduce job.
so hive's join just support equality expression. see hive joins wiki for more information.
来源:https://stackoverflow.com/questions/13017897/join-2-tables-in-hive-using-a-phone-number-and-a-prefix-variable-length