Adding a group count column to a PySpark dataframe

前端 未结 3 1972
死守一世寂寞
死守一世寂寞 2020-11-29 08:50

I am coming from R and the tidyverse to PySpark due to its superior Spark handling, and I am struggling to map certain concepts from one context to the other.

In par

3条回答
  •  無奈伤痛
    2020-11-29 09:08

    Great answer @David Bruce Borenstein,

    I found we can get even more close to the tidyverse example:

    from pyspark.sql import Window
    w = Window.partitionBy('x')
    df.withColumn('n', f.count('x').over(w)).sort('x', 'y').show()
    

提交回复
热议问题