How to calculate Median in spark sqlContext for column of data type double

前端 未结 3 962
温柔的废话
温柔的废话 2020-12-03 15:48

I have given the sample table. I want to get the median from \"value\" column for each group \"source\" column. Where source column is of String DataType value column is of

3条回答
  •  长情又很酷
    2020-12-03 16:25

    Here is how it can be done using Spark Scala dataframe functions. This is based on how Imputer is implemented for median strategy in Spark>=2.2 - https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala -

      df.select(colName)
            .stat
            .approxQuantile(colName, Array(0.5), 0.001) //median
            .head
    

提交回复
热议问题