get TopN of all groups after group by using Spark DataFrame

后端 未结 1 867
谎友^
谎友^ 2020-11-29 07:30

I have a Spark SQL DataFrame:

user1 item1 rating1
user1 item2 rating2
user1 item3 rating3
user2 item1 rating4
...

How to group by user and

相关标签:
1条回答
  • 2020-11-29 08:17

    You can use rank window function as follows

    import org.apache.spark.sql.expressions.Window
    import org.apache.spark.sql.functions.{rank, desc}
    
    val n: Int = ???
    
    // Window definition
    val w = Window.partitionBy($"user").orderBy(desc("rating"))
    
    // Filter
    df.withColumn("rank", rank.over(w)).where($"rank" <= n)
    

    If you don't care about ties then you can replace rank with row_number

    0 讨论(0)
提交回复
热议问题