How to calculate sum and count in a single groupBy?

前端 未结 3 574
醉梦人生
醉梦人生 2020-12-28 16:03

Based on the following DataFrame:

val client = Seq((1,\"A\",10),(2,\"A\",5),(3,\"B\",56)).toDF(\"ID\",\"Categ\",\"Amnt\")
+---+-----+----+
| ID|         


        
3条回答
  •  离开以前
    2020-12-28 16:21

    I'm giving different example than yours

    multiple group functions are possible like this. try it accordingly

      // In 1.3.x, in order for the grouping column "department" to show up,
    // it must be included explicitly as part of the agg function call.
    df.groupBy("department").agg($"department", max("age"), sum("expense"))
    
    // In 1.4+, grouping column "department" is included automatically.
    df.groupBy("department").agg(max("age"), sum("expense"))
    

    import org.apache.spark.sql.{DataFrame, SparkSession}
    import org.apache.spark.sql.functions._
    
    val spark: SparkSession = SparkSession
          .builder.master("local")
          .appName("MyGroup")
          .getOrCreate()
    import spark.implicits._
        val client: DataFrame = spark.sparkContext.parallelize(
    Seq((1,"A",10),(2,"A",5),(3,"B",56))
    ).toDF("ID","Categ","Amnt")
    
    client.groupBy("Categ").agg(sum("Amnt"),count("ID")).show()
    

    +-----+---------+---------+
    |Categ|sum(Amnt)|count(ID)|
    +-----+---------+---------+
    |    B|       56|        1|
    |    A|       15|        2|
    +-----+---------+---------+
    

提交回复
热议问题