问题
i have a hive table as below with 5 columns
name orderno productcategory amount description
KJFSFKS 1 1 40 D1
KJFSFKS 2 2 50 D2
KJFSFKS 3 2 67 D3
KJFSFKS 4 2 10 D4
KJFSFKS 5 3 2 D5
KJFSFKS 6 3 5 D6
KJFSFKS 7 3 6 D7
KJFSFKS 8 4 8 D8
KJFSFKS 9 5 8 D9
KJFSFKS 10 5 10 D10
desired output based on same product category code, if productcategory code is same across multiple rows add amount field, pick the description based on highest orderno, orderno always picklowest, output as below
name orderno productcategory amount description
KJFSFKS 1 1 40 D1
KJFSFKS 2 2 127 D4
KJFSFKS 5 3 13 D7
KJFSFKS 8 4 8 D8
KJFSFKS 9 5 18 D10
As said above,some fields are in some order, other in different order
i used group by but sum(amount) is fine, what about description field, it is based on orderno column, also there are other columns in my requirement where i should pick based on order number
回答1:
select name, orderno, productcategory, amount, description
from
(
select name, orderno, productcategory,
sum(amount) over(partition by name, productcategory) amount,
first_value(description) over(partition by name, productcategory order by orderno desc) description,
row_number() over (partition by name, productcategory order by orderno) rn
from your_table
)s where rn=1; --pick lowest orderno
OK
KJFSFKS 1 1 40 D1
KJFSFKS 2 2 127 D4
KJFSFKS 5 3 13 D7
KJFSFKS 8 4 8 D8
KJFSFKS 9 5 18 D10
Time taken: 12.492 seconds, Fetched: 5 row(s)
回答2:
select name
,min(orderno) as orderno
,productcategory
,sum(amount) as amount
,max(named_struct('orderno',orderno,'description',description)).description
from mytable
group by name
,productcategory
;
来源:https://stackoverflow.com/questions/45328004/hive-sql-bundling-columns-for-few-columns-rest-of-the-columns-are-pull-based-low