How to calculate cumulative sum using sqlContext

后端 未结 4 2161
予麋鹿
予麋鹿 2020-12-15 02:11

I know we can use Window function in pyspark to calculate cumulative sum. But Window is only supported in HiveContext and not in SQLContext. I need to use SQLContext as Hive

4条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-15 02:19

    It is not true that windows function works only with HiveContext. You can use them even with sqlContext:

    from pyspark.sql.window import *
    
    myPartition=Window.partitionBy(['col1','col2','col3'])
    
    temp= temp.withColumn("#dummy",sum(temp.col4).over(myPartition))
    

提交回复
热议问题