Add column sum as new column in PySpark dataframe

前端 未结 8 2068
粉色の甜心
粉色の甜心 2020-12-02 22:43

I\'m using PySpark and I have a Spark dataframe with a bunch of numeric columns. I want to add a column that is the sum of all the other columns.

Suppose my datafram

8条回答
  •  广开言路
    2020-12-02 23:14

    df = spark.createDataFrame([("linha1", "valor1", 2), ("linha2", "valor2", 5)], ("Columna1", "Columna2", "Columna3"))
    
    df.show()
    
    +--------+--------+--------+
    |Columna1|Columna2|Columna3|
    +--------+--------+--------+
    |  linha1|  valor1|       2|
    |  linha2|  valor2|       5|
    +--------+--------+--------+
    
    df = df.withColumn('DivisaoPorDois', df[2]/2)
    df.show()
    
    +--------+--------+--------+--------------+
    |Columna1|Columna2|Columna3|DivisaoPorDois|
    +--------+--------+--------+--------------+
    |  linha1|  valor1|       2|           1.0|
    |  linha2|  valor2|       5|           2.5|
    +--------+--------+--------+--------------+
    
    df = df.withColumn('Soma_Colunas', df[2]+df[3])
    df.show()
    
    +--------+--------+--------+--------------+------------+
    |Columna1|Columna2|Columna3|DivisaoPorDois|Soma_Colunas|
    +--------+--------+--------+--------------+------------+
    |  linha1|  valor1|       2|           1.0|         3.0|
    |  linha2|  valor2|       5|           2.5|         7.5|
    +--------+--------+--------+--------------+------------+
    

提交回复
热议问题