I have a dataframe that looks like:
A B C
---------------
A1 B1 0.8
A1 B2 0.55
A1 B3 0.43
A2 B1 0.7
A2 B2 0.5
A2 B3 0.5
Add rank:
from pyspark.sql.functions import *
from pyspark.sql.window import Window
ranked = df.withColumn(
"rank", dense_rank().over(Window.partitionBy("A").orderBy(desc("C"))))
Group by:
grouped = ranked.groupBy("B").agg(collect_list(struct("A", "rank")).alias("tmp"))
Sort and select:
grouped.select("B", sort_array("tmp")["rank"].alias("ranks"))
Tested with Spark 2.1.0.