Facing issue in hive query in generating missing dates

岁酱吖の 提交于 2019-12-02 07:22:12

问题


I have a requirement where I need to go back to previous values for a column until 1000 rows and get those previous 1000 dates for my next steps , but all those 1000 previous dates are not present for that column in the table. But I need those missing dates to get from output of the query . When I try to run below query it is not displaying 1000 previous date values from current date

Example lets say only 2 dates are available for date column

date      
2019-01-16 
2019-01-19

I have come up with a query to get back 1000 dates but it is giving only nearest date as all previous back dates are missing

SELECT date FROM  table1 t
WHERE 
date >= date_sub(current_date,1000) and  dt<current_date ORDER BY date LIMIT 1

If I run above query it is displaying 2019-01-16, since previous 1000 days back date are not present it is giving nearest date ,which is 2019-01-16 but I need missing dates starting from 2016-04-23 (1000th date from current date) till before current date (2019-01-18) as output of my query. Could you please help on this.


回答1:


You can generate dates for required range in the subquery (see date_range subquery in the example below) and left join it with your table. If there is no record in your table on some dates, the value will be null, dates will be returned from the date_range subquery without gaps. Set start_date and end_date parameters for date_range required:

set hivevar:start_date=2016-04-23; --replace with your start_date
set hivevar:end_date=current_date; --replace with your end_date

set hive.exec.parallel=true;
set hive.auto.convert.join=true; --this enables map-join
set hive.mapjoin.smalltable.filesize=25000000; --size of table to fit in memory

with date_range as 
(--this query generates date range, check it's output
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff(${hivevar:end_date},'${hivevar:start_date}')),' ')) as (i,x) ) s
) 

select d.dt as date,
       t.your_col --some value from your table on date
  from date_range d 
       left join table1 t on d.dt=t.date 
order by d.dt --order by dates if necessary


来源:https://stackoverflow.com/questions/54265149/facing-issue-in-hive-query-in-generating-missing-dates

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!