I have this test data:
val data = List(
List(47.5335D),
List(67.5335D),
List(69.5335D),
List(444.1235D),
List(677.53
This is the result from my local. Do you do something similar?
val data = List(
List(47.5335D),
List(67.5335D),
List(69.5335D),
List(444.1235D),
List(677.5335D)
)
val df = data.flatten.toDF
df.stat.approxQuantile("value", Array(0.5), 0)
// res18: Array[Double] = Array(67.5335)
Note that this is an approximate quantiles computation. It is not supposed to give you the exact answer all the time. See here for a more thorough explanation.
The reason is that for very large datasets, sometimes you are OK with an approximate answer, as long as you get it significantly faster than the exact computation.
I encountered this similar problem when trying to use the approxQuantile() method with Spark-2.2.1
. When I upgraded to Spark-2.4.3
, approxQuantile() now returns the right exact median.