How to calculate sum and count in a single groupBy?

前端 未结 3 577
醉梦人生
醉梦人生 2020-12-28 16:03

Based on the following DataFrame:

val client = Seq((1,\"A\",10),(2,\"A\",5),(3,\"B\",56)).toDF(\"ID\",\"Categ\",\"Amnt\")
+---+-----+----+
| ID|         


        
3条回答
  •  甜味超标
    2020-12-28 16:32

    There are multiple ways to do aggregate functions in spark,

    val client = Seq((1,"A",10),(2,"A",5),(3,"B",56)).toDF("ID","Categ","Amnt")
    

    1.

    val aggdf = client.groupBy('Categ).agg(Map("ID"->"count","Amnt"->"sum"))
    
    +-----+---------+---------+
    |Categ|count(ID)|sum(Amnt)|
    +-----+---------+---------+
    |B    |1        |56       |
    |A    |2        |15       |
    +-----+---------+---------+
    
    //Rename and sort as needed.
    aggdf.sort('Categ).withColumnRenamed("count(ID)","Count").withColumnRenamed("sum(Amnt)","sum")
    +-----+-----+---+
    |Categ|Count|sum|
    +-----+-----+---+
    |A    |2    |15 |
    |B    |1    |56 |
    +-----+-----+---+
    

    2.

    import org.apache.spark.sql.functions._
    client.groupBy('Categ).agg(count("ID").as("count"),sum("Amnt").as("sum"))
    +-----+-----+---+
    |Categ|count|sum|
    +-----+-----+---+
    |B    |1    |56 |
    |A    |2    |15 |
    +-----+-----+---+
    

    3.

    import com.google.common.collect.ImmutableMap;
    client.groupBy('Categ).agg(ImmutableMap.of("ID", "count", "Amnt", "sum"))
    +-----+---------+---------+
    |Categ|count(ID)|sum(Amnt)|
    +-----+---------+---------+
    |B    |1        |56       |
    |A    |2        |15       |
    +-----+---------+---------+
    //Use column rename is required. 
    

    4. If you are SQL expert, you can do this too

    client.createOrReplaceTempView("df")
    
     val aggdf = spark.sql("select Categ, count(ID),sum(Amnt) from df group by Categ")
     aggdf.show()
    
        +-----+---------+---------+
        |Categ|count(ID)|sum(Amnt)|
        +-----+---------+---------+
        |    B|        1|       56|
        |    A|        2|       15|
        +-----+---------+---------+
    

提交回复
热议问题