How to calculate cumulative sum using sqlContext

后端 未结 4 2171
予麋鹿
予麋鹿 2020-12-15 02:11

I know we can use Window function in pyspark to calculate cumulative sum. But Window is only supported in HiveContext and not in SQLContext. I need to use SQLContext as Hive

4条回答
  •  醉话见心
    2020-12-15 02:37

    After landing on this thread trying to solve a similar problem, I've solved my issue using this code. Not sure if I'm missing part of the OP, but this is a way to sum a SQLContext column:

    from pyspark.conf import SparkConf
    from pyspark.context import SparkContext
    from pyspark.sql.context import SQLContext
    
    sc = SparkContext() 
    sc.setLogLevel("ERROR")
    conf = SparkConf()
    conf.setAppName('Sum SQLContext Column')
    conf.set("spark.executor.memory", "2g")
    sqlContext = SQLContext(sc)
    
    def sum_column(table, column):
        sc_table = sqlContext.table(table)
        return sc_table.agg({column: "sum"})
    
    sum_column("db.tablename", "column").show()
    

提交回复
热议问题