Add column sum as new column in PySpark dataframe

前端 未结 8 2005
粉色の甜心
粉色の甜心 2020-12-02 22:43

I\'m using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a column that is the sum of all the other columns.

Suppose my datafram

8条回答
  •  遥遥无期
    2020-12-02 23:24

    Summing multiple columns from a list into one column

    PySpark's sum function doesn't support column addition. This can be achieved using expr function.

    from pyspark.sql.functions import expr
    
    cols_list = ['a', 'b', 'c']
    
    # Creating an addition expression using `join`
    expression = '+'.join(cols_list)
    
    df = df.withColumn('sum_cols', expr(expression))
    

    This gives us the desired sum of columns.

提交回复
热议问题