PySpark - Sum a column in dataframe and return results as int

前端 未结 6 1435
执念已碎
执念已碎 2020-12-24 08:13

I have a pyspark dataframe with a column of numbers. I need to sum that column and then have the result return as an int in a python variable.

df = spark.cr         


        
6条回答
  •  渐次进展
    2020-12-24 08:40

    sometimes read a csv file to pyspark Dataframe, maybe the numeric column change to string type '23',like this, you should use pyspark.sql.functions.sum to get the result as int , not sum()

    import pyspark.sql.functions as F                                                    
    df.groupBy().agg(F.sum('Number')).show()
    

提交回复
热议问题