Group By, Rank and aggregate spark data frame using pyspark

送分小仙女□ 提交于 2019-12-03 03:04:33

Add rank:

from pyspark.sql.functions import *
from pyspark.sql.window import Window

ranked =  df.withColumn(
  "rank", dense_rank().over(Window.partitionBy("A").orderBy(desc("C"))))

Group by:

grouped = ranked.groupBy("B").agg(collect_list(struct("A", "rank")).alias("tmp"))

Sort and select:

grouped.select("B", sort_array("tmp")["rank"].alias("ranks"))

Tested with Spark 2.1.0.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!