问题
I have a huge list of title. I wanna count each title in whole data set. for example:
`title`
A
b
A
c
c
c
output:
title fre
A 2
b 1
c 3
回答1:
You can just groupBy
title
and then count
:
import pyspark.sql.functions as f
df.groupBy('title').agg(f.count('*').alias('count')).show()
+-----+-----+
|title|count|
+-----+-----+
| A| 2|
| c| 3|
| b| 1|
+-----+-----+
Or more concisely:
df.groupBy('title').count().show()
+-----+-----+
|title|count|
+-----+-----+
| A| 2|
| c| 3|
| b| 1|
+-----+-----+
回答2:
hi you can do that
import pandas as pd
title=["A","b","A","c","c","c"]
pd.Series(title).value_counts()
来源:https://stackoverflow.com/questions/65657330/how-count-in-pyspark