Based on the following DataFrame
:
val client = Seq((1,\"A\",10),(2,\"A\",5),(3,\"B\",56)).toDF(\"ID\",\"Categ\",\"Amnt\")
+---+-----+----+
| ID|
I'm giving different example than yours
multiple group functions are possible like this. try it accordingly
// In 1.3.x, in order for the grouping column "department" to show up,
// it must be included explicitly as part of the agg function call.
df.groupBy("department").agg($"department", max("age"), sum("expense"))
// In 1.4+, grouping column "department" is included automatically.
df.groupBy("department").agg(max("age"), sum("expense"))
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.functions._
val spark: SparkSession = SparkSession
.builder.master("local")
.appName("MyGroup")
.getOrCreate()
import spark.implicits._
val client: DataFrame = spark.sparkContext.parallelize(
Seq((1,"A",10),(2,"A",5),(3,"B",56))
).toDF("ID","Categ","Amnt")
client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show()
+-----+---------+---------+
|Categ|sum(Amnt)|count(ID)|
+-----+---------+---------+
| B| 56| 1|
| A| 15| 2|
+-----+---------+---------+