hive sql find the latest record

后端 未结 8 2090
别那么骄傲
别那么骄傲 2021-01-30 17:28

the table is:

create table test (
id string,
name string,
age string,
modified string)

data like this:

id    name   age  modife         


        
8条回答
  •  甜味超标
    2021-01-30 18:07

    There's a nearly undocumented feature of Hive SQL (I found it in one of their Jira bug reports) that lets you do something like argmax() using struct()s. For example if you have a table like:

    test_argmax
    id,val,key
    1,1,A
    1,2,B
    1,3,C
    1,2,D
    2,1,E
    2,1,U
    2,2,V
    2,3,W
    2,2,X
    2,1,Y
    

    You can do this:

    select 
      max(struct(val, key, id)).col1 as max_val,
      max(struct(val, key, id)).col2 as max_key,
      max(struct(val, key, id)).col3 as max_id
    from test_argmax
    group by id
    

    and get the result:

    max_val,max_key,max_id
    3,C,1
    3,W,2
    

    I think in case of ties on val (the first struct element) it will fall back to comparison on the second column. I also haven't figured out whether there's a neater syntax for getting the individual columns back out of the resulting struct, maybe using named_struct somehow?

提交回复
热议问题