Join Tables on Date Range in Hive

只谈情不闲聊 提交于 2020-01-03 10:49:33

问题


I need to join tableA to tableB on employee_id and the cal_date from table A need to be between date start and date end from table B. I ran below query and received below error message, Would you please help me to correct and query. Thank you for you help!

Both left and right aliases encountered in JOIN 'date_start'.

select a.*, b.skill_group 
from tableA a 
  left join tableB b 
    on a.employee_id= b.employee_id 
    and a.cal_date >= b.date_start 
    and a.cal_date <= b.date_end

回答1:


RTFM - quoting LanguageManual Joins

Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.

You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.




回答2:


If your situation allows, do it in two queries.

First with the full join, which can have the range; Then with an outer join, matching on all the columns, but include a where clause for where one of the fields is null.

Ex:

create table tableC as
select a.*, b.skill_group 
    from tableA a 
    ,    tableB b 
    where a.employee_id= b.employee_id 
      and a.cal_date >= b.date_start 
      and a.cal_date <= b.date_end;

with c as (select * from TableC)
insert into tableC
select a.*, cast(null as string) as skill_group
from tableA a 
  left join c
    on (a.employee_id= c.employee_id 
    and a.cal_date  = c.cal_date)
where c.employee_id is null ;


来源:https://stackoverflow.com/questions/35944968/join-tables-on-date-range-in-hive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!