hive sql find the latest record

后端 未结 8 2077
别那么骄傲
别那么骄傲 2021-01-30 17:28

the table is:

create table test (
id string,
name string,
age string,
modified string)

data like this:

id    name   age  modife         


        
8条回答
  •  無奈伤痛
    2021-01-30 18:10

    There is a relatively recent feature of Hive SQL, analytic functions and the over clause. This should do the job without joins

    select id, name, age, last_modified 
    from ( select id, name, age, modified, 
                  max( modified) over (partition by id) as last_modified 
           from test ) as sub
    where   modified = last_modified 
    

    What's going on here is that the subquery produces a new row with an extra column last_modified which has the latest modified timestamp for the corresponding person's id. (Similar to what group by would do) The key here is that the subquery gets you again one row per row in your original table and then you filter from that.

    There is a chance that even the simpler solution works:

    select  id, name, age,  
            max( modified) over (partition by id) last_modified 
    from test 
    where   modified = last_modified 
    

    By the way, the same code would work in Impala, too.

提交回复
热议问题