Adding a group count column to a PySpark dataframe

前端 未结 3 1973
死守一世寂寞
死守一世寂寞 2020-11-29 08:50

I am coming from R and the tidyverse to PySpark due to its superior Spark handling, and I am struggling to map certain concepts from one context to the other.

In par

3条回答
  •  迷失自我
    2020-11-29 09:02

    as @pault appendix

    import pyspark.sql.functions as F
    
    ...
    
    (df
    .groupBy(F.col('x'))
    .agg(F.count('x').alias('n'))
    .show())
    
    #+---+---+
    #|  x|  n|
    #+---+---+
    #|  b|  1|
    #|  a|  3|
    #+---+---+
    

    enjoy

提交回复
热议问题