Adding a group count column to a PySpark dataframe

前端未结

关注

 3  1973

死守一世寂寞 2020-11-29 08:50

I am coming from R and the tidyverse to PySpark due to its superior Spark handling, and I am struggling to map certain concepts from one context to the other.

In par

3条回答

迷失自我 (楼主)

2020-11-29 09:02

as @pault appendix

import pyspark.sql.functions as F

...

(df
.groupBy(F.col('x'))
.agg(F.count('x').alias('n'))
.show())

#+---+---+
#|  x|  n|
#+---+---+
#|  b|  1|
#|  a|  3|
#+---+---+

enjoy

0 讨论(0)

查看其它3个回答