I have a dataframe where I want to give id\'s in each Window partition. For example I have
id | col |
1 | a |
2 | a |
3 | b |
4 | c |
5 | c |
Simply using a dense_rank inbuilt function over Window function should give you your desired result as
from pyspark.sql import window as W
import pyspark.sql.functions as f
df.select('id', f.dense_rank().over(W.Window.orderBy('col')).alias('group')).show(truncate=False)
which should give you
+---+-----+
|id |group|
+---+-----+
|1 |1 |
|2 |1 |
|3 |2 |
|4 |3 |
|5 |3 |
+---+-----+