I would like to calculate group quantiles on a Spark dataframe (using PySpark). Either an approximate or exact result would be fine. I prefer a solution that I can use withi
problem of "percentile_approx(val, 0.5)": if e.g. range is [1,2,3,4] this function returns 2 (as median) the function below returns 2.5:
import statistics
median_udf = F.udf(lambda x: statistics.median(x) if bool(x) else None, DoubleType())
... .groupBy('something').agg(median_udf(F.collect_list(F.col('value'))).alias('median'))