How to slice and sum elements of array column?

后端 未结 6 1700
暖寄归人
暖寄归人 2020-12-03 13:09

I would like to sum (or perform other aggregate functions too) on the array column using SparkSQL.

I have a table as

+-------+-------+-         


        
6条回答
  •  时光说笑
    2020-12-03 13:26

    The rdd way is missing, so let me add it.

    val df = Seq((10, "Finance", Array(100,200,300,400,500)),(20, "IT", Array(10,20,50,100))).toDF("dept_id", "dept_nm","emp_details")
    
    import scala.collection.mutable._
    
    val rdd1 = df.rdd.map( x=> {val p = x.getAs[mutable.WrappedArray[Int]]("emp_details").toArray; Row.merge(x,Row(p.sum,p.slice(0,2).sum)) })
    
    spark.createDataFrame(rdd1,df.schema.add(StructField("sumArray",IntegerType)).add(StructField("sliceArray",IntegerType))).show(false)
    

    Output:

    +-------+-------+-------------------------+--------+----------+
    |dept_id|dept_nm|emp_details              |sumArray|sliceArray|
    +-------+-------+-------------------------+--------+----------+
    |10     |Finance|[100, 200, 300, 400, 500]|1500    |300       |
    |20     |IT     |[10, 20, 50, 100]        |180     |30        |
    +-------+-------+-------------------------+--------+----------+
    

提交回复
热议问题