In HIVE replacing the Null value by the same column values using COALESCE

随声附和 提交于 2021-02-05 09:36:45

问题


I would like to replace the null value of a particular column by values in the same column I would like to get the result

I have tried below

select  
    d_day,
    COALESCE(val, LAST_VALUE(val, TRUE) 
    OVER( ORDER BY d_day ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)) 
    as val from data_table


回答1:


One way to do it is by means of two windowing functions, here is an example:

with tmp_table as (
  select 1 as ts, 3 as val 
  union all
  select 2 as ts, NULL as val
  union all 
  select 3 as ts, NULL as val
  union all
  select 4 as ts, 4 as val
  union all
  select 5 as ts, NULL as val
  union all
  select 6 as ts, 5 as val
  union all 
  select 7 as ts, 6 as val
)
, rank_table as ( 
select *, SUM(val) OVER (ORDER BY ts ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as rnk
  from tmp_table
)
select *, max(val) over (partition by rnk)
  from rank_table

So in your case

with rank_table as ( 
select *, SUM(val) OVER (ORDER BY d_day ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as rnk
  from your_table
)
select *, max(val) over (partition by rnk)
  from rank_table

Keep in mind that the first ORDER BY d_day will make your job run on the single reducer, so if your data is really large it might take some time to finish up.



来源:https://stackoverflow.com/questions/51820146/in-hive-replacing-the-null-value-by-the-same-column-values-using-coalesce

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!