How to sum the values of one column of a dataframe in spark/scala

前端未结

关注

 5  736

鱼传尺愫 2020-12-08 07:03

I have a Dataframe that I read from a CSV file with many columns like: timestamp, steps, heartrate etc.

I want to sum the values of each column, for instance the tot

5条回答

再見小時候 (楼主)

2020-12-08 07:41

If you want to sum all values of one column, it's more efficient to use DataFrame's internal RDD and reduce.

import sqlContext.implicits._
import org.apache.spark.sql.functions._

val df = sc.parallelize(Array(10,2,3,4)).toDF("steps")
df.select(col("steps")).rdd.map(_(0).asInstanceOf[Int]).reduce(_+_)

//res1 Int = 19

0 讨论(0)

查看其它5个回答