Python Spark Cumulative Sum by Group Using DataFrame

前端 未结 2 530
遥遥无期
遥遥无期 2020-12-02 22:54

How do I compute the cumulative sum per group specifically using the DataFrame abstraction; and in PySpark?

With an example da

2条回答
  •  再見小時候
    2020-12-02 23:21

    I have tried this way and it worked for me.

    from pyspark.sql import Window
    
    from pyspark.sql import functions as f
    
    import sys
    
    cum_sum = DF.withColumn('cumsum', f.sum('value').over(Window.partitionBy('class').orderBy('time').rowsBetween(-sys.maxsize, 0)))
    cum_sum.show()
    

提交回复
热议问题