Adding a column of rowsums across a list of columns in Spark Dataframe

后端 未结 4 1650
北恋
北恋 2020-12-14 18:03

I have a Spark dataframe with several columns. I want to add a column on to the dataframe that is a sum of a certain number of the columns.

For example, my data l

4条回答
  •  误落风尘
    2020-12-14 18:41

    You should try the following:

    import org.apache.spark.sql.functions._
    
    val sc: SparkContext = ...
    val sqlContext = new SQLContext(sc)
    
    import sqlContext.implicits._
    
    val input = sc.parallelize(Seq(
      ("a", 5, 7, 9, 12, 13),
      ("b", 6, 4, 3, 20, 17),
      ("c", 4, 9, 4, 6 , 9),
      ("d", 1, 2, 6, 8 , 1)
    )).toDF("ID", "var1", "var2", "var3", "var4", "var5")
    
    val columnsToSum = List(col("var1"), col("var2"), col("var3"), col("var4"), col("var5"))
    
    val output = input.withColumn("sums", columnsToSum.reduce(_ + _))
    
    output.show()
    

    Then the result is:

    +---+----+----+----+----+----+----+
    | ID|var1|var2|var3|var4|var5|sums|
    +---+----+----+----+----+----+----+
    |  a|   5|   7|   9|  12|  13|  46|
    |  b|   6|   4|   3|  20|  17|  50|
    |  c|   4|   9|   4|   6|   9|  32|
    |  d|   1|   2|   6|   8|   1|  18|
    +---+----+----+----+----+----+----+
    

提交回复
热议问题