Python Spark Cumulative Sum by Group Using DataFrame

前端未结

关注

 2  544

遥遥无期 2020-12-02 22:54

How do I compute the cumulative sum per group specifically using the DataFrame abstraction; and in PySpark?

With an example da

2条回答

再見小時候 (楼主)

2020-12-02 23:21

I have tried this way and it worked for me.

from pyspark.sql import Window

from pyspark.sql import functions as f

import sys

cum_sum = DF.withColumn('cumsum', f.sum('value').over(Window.partitionBy('class').orderBy('time').rowsBetween(-sys.maxsize, 0)))
cum_sum.show()

0 讨论(0)

查看其它2个回答