I have user logs that I have taken from a csv and converted into a DataFrame in order to leverage the SparkSQL querying features. A single user will create numerous entries
The accepted code does not compile, as it has a typo (as pointed out by MRez). The snippet below works and is tested.
For Spark 2.0+ :
import org.apache.spark.sql.functions._
val _avg_std = df.groupBy("user").agg(
avg(col("duration").alias("avg")),
stddev(col("duration").alias("stdev")),
stddev_pop(col("duration").alias("stdev_pop")),
stddev_samp(col("duration").alias("stdev_samp"))
)