how count in pyspark? [closed]

问题

I have a huge list of title. I wanna count each title in whole data set. for example:

`title`

   A
   b
   A
   c
   c
   c

output:

 title fre
     A   2
     b   1
     c   3

回答1:

You can just groupBy title and then count:

import pyspark.sql.functions as f
df.groupBy('title').agg(f.count('*').alias('count')).show()
+-----+-----+
|title|count|
+-----+-----+
|    A|    2|
|    c|    3|
|    b|    1|
+-----+-----+

Or more concisely:

df.groupBy('title').count().show()

+-----+-----+
|title|count|
+-----+-----+
|    A|    2|
|    c|    3|
|    b|    1|
+-----+-----+

回答2:

hi you can do that

 import pandas as pd
 title=["A","b","A","c","c","c"]
 pd.Series(title).value_counts()

来源：https://stackoverflow.com/questions/65657330/how-count-in-pyspark

标签

Hadoop

pyspark

count

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!