Hive QL - Limiting number of rows per each item

无人久伴 提交于 2019-11-29 19:13:01

问题


If I have multiple items listed in a where clause How would one go about limiting the results to N for each item in the list?

EX:

select a_id,b,c, count(*), as sumrequests
from table_name
where
a_id in (1,2,3)
group by a_id,b,c
limit 10000

回答1:


Sounds like your question is to get the top N per a_id. You can do this with a window function, introduced in Hive 11. Something like:

SELECT a_id, b, c, count(*) as sumrequests
FROM (
    SELECT a_id, b, c, row_number() over (Partition BY a_id) as row
    FROM table_name
    ) rs
WHERE row <= 10000
AND a_id in (1, 2, 3)
GROUP BY a_id, b, c;

This will output up to 10,000 randomly-chosen rows per a_id. You can partition it further if you're looking to group by more than just a_id. You can also use order by in the window functions, there are a lot of examples out there to show additional options.



来源:https://stackoverflow.com/questions/11750312/hive-ql-limiting-number-of-rows-per-each-item

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!