Adding a group count column to a PySpark dataframe

前端未结

关注

 3  1976

死守一世寂寞 2020-11-29 08:50

I am coming from R and the tidyverse to PySpark due to its superior Spark handling, and I am struggling to map certain concepts from one context to the other.

In par

3条回答

無奈伤痛 (楼主)

2020-11-29 09:08
Great answer @David Bruce Borenstein,

I found we can get even more close to the tidyverse example:
```
from pyspark.sql import Window
w = Window.partitionBy('x')
df.withColumn('n', f.count('x').over(w)).sort('x', 'y').show()
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...