PySpark - get row number for each row in a group

后端 未结 2 1619
借酒劲吻你
借酒劲吻你 2020-11-29 09:25

Using pyspark, I\'d like to be able to group a spark dataframe, sort the group, and then provide a row number. So

Group    Date
  A      2000
  A      2002
          


        
2条回答
  •  臣服心动
    2020-11-29 09:59

    Use window function:

    from pyspark.sql.window import *
    from pyspark.sql.functions import row_number
    
    df.withColumn("row_num", row_number().over(Window.partitionBy("Group").orderBy("Date")))
    

提交回复
热议问题