Hive getting top n records in group by query

后端 未结 6 1985
终归单人心
终归单人心 2020-12-07 17:47

I have following table in hive

user-id, user-name, user-address,clicks,impressions,page-id,page-name

I need to find out top 5 users[user-id,user-name,user-ad

6条回答
  •  广开言路
    2020-12-07 18:51

    Revised answer, fixing the bug as mentioned by @Himanshu Gahlot

    SELECT page-id, user-id, clicks
    FROM (
        SELECT page-id, user-id, rank(page-id) as rank, clicks FROM (
            SELECT page-id, user-id, clicks FROM mytable
            DISTRIBUTE BY page-id
            SORT BY page-id, clicks desc
    ) a ) b
    WHERE rank < 5
    ORDER BY page-id, rank
    

    Note that the rank() UDAF is applied to the page-id column, whose new value is used to reset or increase the rank counter (e.g. reset counter for each page-id partition)

提交回复
热议问题