I have following table in hive
user-id, user-name, user-address,clicks,impressions,page-id,page-name
I need to find out top 5 users[user-id,user-name,user-ad
Revised answer, fixing the bug as mentioned by @Himanshu Gahlot
SELECT page-id, user-id, clicks
FROM (
SELECT page-id, user-id, rank(page-id) as rank, clicks FROM (
SELECT page-id, user-id, clicks FROM mytable
DISTRIBUTE BY page-id
SORT BY page-id, clicks desc
) a ) b
WHERE rank < 5
ORDER BY page-id, rank
Note that the rank() UDAF is applied to the page-id column, whose new value is used to reset or increase the rank counter (e.g. reset counter for each page-id partition)