Hive getting top n records in group by query

后端 未结 6 1972
终归单人心
终归单人心 2020-12-07 17:47

I have following table in hive

user-id, user-name, user-address,clicks,impressions,page-id,page-name

I need to find out top 5 users[user-id,user-name,user-ad

6条回答
  •  北海茫月
    2020-12-07 18:49

    Let us say your data looks like following :

    page-id   user-id   clicks
    page1     user1     10
    page1     user2     10
    page1     user3     9
    page1     user4     8
    page1     user5     7
    page1     user6     7
    page1     user7     6
    page1     user8     5
    page2     user1     20
    page2     user2     19
    page2     user3     18
    

    Below Query will give you :

    SELECT page-id, user-id, clicks, rank
    FROM (
        SELECT page-id, user-id, rank() 
               over (PARTITION BY page-id ORDER BY clicks DESC) as rank, clicks 
        FROM your_table
    ) ranked_table
    WHERE ranked_table.rank <= 5
    

    Result :

    page-id   user-id   clicks  rank
    page1     user1     10      1
    page1     user2     10      1 
    page1     user3     9       3 
    page1     user4     8       4
    page1     user5     7       5 
    page1     user6     7       5 
    page2     user1     20      1
    page2     user2     19      2  
    page2     user3     18      3
    

    So, for page1 you are getting 6 users, as users with same number of clicks are ranked same.

    But, if you are looking for exactly 5 users, and pick randomly in case multiple users fall in same rank. You can use the below query

    SELECT page-id, user-id, clicks, rank
    FROM (
        SELECT page-id, user-id, row_number() 
               over (PARTITION BY page-id ORDER BY clicks DESC) as rank, clicks 
        FROM your_table
    ) ranked_table
    WHERE ranked_table.rank <= 5
    

    Result :

    page-id   user-id   clicks  rank
    page1     user1     10      1
    page1     user2     10      2 
    page1     user3     9       3 
    page1     user4     8       4
    page1     user5     7       5 
    page2     user1     20      1
    page2     user2     19      2  
    page2     user3     18      3
    

提交回复
热议问题