How to select the first row of each group?

前端 未结 8 917
心在旅途
心在旅途 2020-11-21 05:49

I have a DataFrame generated as follow:

df.groupBy($\"Hour\", $\"Category\")
  .agg(sum($\"value\") as \"TotalValue\")
  .sort($\"Hour\".asc, $\"TotalValue\"         


        
8条回答
  •  庸人自扰
    2020-11-21 06:30

    For Spark 2.0.2 with grouping by multiple columns:

    import org.apache.spark.sql.functions.row_number
    import org.apache.spark.sql.expressions.Window
    
    val w = Window.partitionBy($"col1", $"col2", $"col3").orderBy($"timestamp".desc)
    
    val refined_df = df.withColumn("rn", row_number.over(w)).where($"rn" === 1).drop("rn")
    

提交回复
热议问题