I know we can use Window function in pyspark to calculate cumulative sum. But Window is only supported in HiveContext and not in SQLContext. I need to use SQLContext as Hive
It is not true that windows function works only with HiveContext. You can use them even with sqlContext:
from pyspark.sql.window import * myPartition=Window.partitionBy(['col1','col2','col3']) temp= temp.withColumn("#dummy",sum(temp.col4).over(myPartition))