Count most frequent 100 words from sentences in Dataframe Pandas

前端未结

关注

 3  890

清酒与你 2020-12-05 03:33

I have text reviews in one column in Pandas dataframe and I want to count the N-most frequent words with their frequency counts (in whole column - NOT in single cell). One a

3条回答

旧巷少年郎 (楼主)

2020-12-05 04:11
Along with @Joran's solution you could also you use series.value_counts for large amounts of text/rows
```
 pd.Series(' '.join(df['text']).lower().split()).value_counts()[:100]
```
You would find from the benchmarks series.value_counts seems twice (2X) faster than Counter method

For Movie Reviews dataset of 3000 rows, totaling 400K characters and 70k words.
```
In [448]: %timeit Counter(" ".join(df.text).lower().split()).most_common(100)
10 loops, best of 3: 44.2 ms per loop

In [449]: %timeit pd.Series(' '.join(df.text).lower().split()).value_counts()[:100]
10 loops, best of 3: 27.1 ms per loop
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...