Median / quantiles within PySpark groupBy

后端未结

关注

 5  986

感情败类 2020-12-04 15:26

I would like to calculate group quantiles on a Spark dataframe (using PySpark). Either an approximate or exact result would be fine. I prefer a solution that I can use withi

5条回答

北海茫月 (楼主)

2020-12-04 16:16
Since you have access to percentile_approx, one simple solution would be to use it in a SQL command:
```
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

df.registerTempTable("df")
df2 = sqlContext.sql("select grp, percentile_approx(val, 0.5) as med_val from df group by grp")
```
0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...