Based on the following DataFrame:
val client = Seq((1,\"A\",10),(2,\"A\",5),(3,\"B\",56)).toDF(\"ID\",\"Categ\",\"Amnt\")
+---+-----+----+
| ID|
There are multiple ways to do aggregate functions in spark,
val client = Seq((1,"A",10),(2,"A",5),(3,"B",56)).toDF("ID","Categ","Amnt")
1.
val aggdf = client.groupBy('Categ).agg(Map("ID"->"count","Amnt"->"sum"))
+-----+---------+---------+
|Categ|count(ID)|sum(Amnt)|
+-----+---------+---------+
|B |1 |56 |
|A |2 |15 |
+-----+---------+---------+
//Rename and sort as needed.
aggdf.sort('Categ).withColumnRenamed("count(ID)","Count").withColumnRenamed("sum(Amnt)","sum")
+-----+-----+---+
|Categ|Count|sum|
+-----+-----+---+
|A |2 |15 |
|B |1 |56 |
+-----+-----+---+
2.
import org.apache.spark.sql.functions._
client.groupBy('Categ).agg(count("ID").as("count"),sum("Amnt").as("sum"))
+-----+-----+---+
|Categ|count|sum|
+-----+-----+---+
|B |1 |56 |
|A |2 |15 |
+-----+-----+---+
3.
import com.google.common.collect.ImmutableMap;
client.groupBy('Categ).agg(ImmutableMap.of("ID", "count", "Amnt", "sum"))
+-----+---------+---------+
|Categ|count(ID)|sum(Amnt)|
+-----+---------+---------+
|B |1 |56 |
|A |2 |15 |
+-----+---------+---------+
//Use column rename is required.
4. If you are SQL expert, you can do this too
client.createOrReplaceTempView("df")
val aggdf = spark.sql("select Categ, count(ID),sum(Amnt) from df group by Categ")
aggdf.show()
+-----+---------+---------+
|Categ|count(ID)|sum(Amnt)|
+-----+---------+---------+
| B| 1| 56|
| A| 2| 15|
+-----+---------+---------+