Hive/SQL bundling columns for few columns,rest of the columns are pull based lowest/highest of other columns

会有一股神秘感。 提交于 2019-12-12 03:38:54

问题


i have a hive table as below with 5 columns

name orderno productcategory amount description
KJFSFKS 1   1   40  D1
KJFSFKS 2   2   50  D2
KJFSFKS 3   2   67  D3
KJFSFKS 4   2   10  D4
KJFSFKS 5   3   2   D5
KJFSFKS 6   3   5   D6
KJFSFKS 7   3   6   D7
KJFSFKS 8   4   8   D8
KJFSFKS 9   5   8   D9
KJFSFKS 10  5   10  D10

desired output based on same product category code, if productcategory code is same across multiple rows add amount field, pick the description based on highest orderno, orderno always picklowest, output as below

name    orderno productcategory amount  description
KJFSFKS 1   1   40  D1
KJFSFKS 2   2   127 D4
KJFSFKS 5   3   13  D7
KJFSFKS 8   4   8   D8
KJFSFKS 9   5   18  D10

As said above,some fields are in some order, other in different order

i used group by but sum(amount) is fine, what about description field, it is based on orderno column, also there are other columns in my requirement where i should pick based on order number


回答1:


select name, orderno,  productcategory,  amount,   description 
from 
(
select name, orderno, productcategory, 
       sum(amount) over(partition by name, productcategory) amount, 
       first_value(description) over(partition by name, productcategory order by orderno desc) description,
       row_number() over (partition by name, productcategory order by orderno) rn
from  your_table
)s where rn=1; --pick lowest orderno 

OK
KJFSFKS 1       1       40      D1
KJFSFKS 2       2       127     D4
KJFSFKS 5       3       13      D7
KJFSFKS 8       4       8       D8
KJFSFKS 9       5       18      D10
Time taken: 12.492 seconds, Fetched: 5 row(s)



回答2:


select      name
           ,min(orderno)    as orderno
           ,productcategory
           ,sum(amount)     as amount
           ,max(named_struct('orderno',orderno,'description',description)).description

from        mytable

group by    name
           ,productcategory
;


来源:https://stackoverflow.com/questions/45328004/hive-sql-bundling-columns-for-few-columns-rest-of-the-columns-are-pull-based-low

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!