How to sum the values of one column of a dataframe in spark/scala

前端 未结 5 736
鱼传尺愫
鱼传尺愫 2020-12-08 07:03

I have a Dataframe that I read from a CSV file with many columns like: timestamp, steps, heartrate etc.

I want to sum the values of each column, for instance the tot

5条回答
  •  再見小時候
    2020-12-08 07:41

    If you want to sum all values of one column, it's more efficient to use DataFrame's internal RDD and reduce.

    import sqlContext.implicits._
    import org.apache.spark.sql.functions._
    
    val df = sc.parallelize(Array(10,2,3,4)).toDF("steps")
    df.select(col("steps")).rdd.map(_(0).asInstanceOf[Int]).reduce(_+_)
    
    //res1 Int = 19
    

提交回复
热议问题