Efficiently calculate word frequency in a string

后端未结

关注

 3  1172

I am parsing a long string of text and calculating the number of times each word occurs in Python. I have a function that works but I am looking for advice on whether there

相关标签:

3条回答

无人及你

2020-12-10 12:45

Use collections.Counter:

>>> from collections import Counter
>>> test = 'abc def abc def zzz zzz'
>>> Counter(test.split()).most_common()
[('abc', 2), ('zzz', 2), ('def', 2)]

0 讨论(0)

梦毁少年i

2020-12-10 12:48
You can also use NLTK (Natural Language ToolKit). It provide very nice libraries for studying the processing the texts. for this example you can use:
```
from nltk import FreqDist

text = "aa bb cc aa bb"
fdist1 = FreqDist(text)

# show most 10 frequent word in the text
print fdist1.most_common(10)
```
the result will be:
```
[('aa', 2), ('bb', 2), ('cc', 1)]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

情歌与酒

2020-12-10 12:53

>>>> test = """abc def-ghi jkl abc
abc"""
>>> from collections import Counter
>>> words = Counter()
>>> words.update(test.split()) # Update counter with words
>>> words.most_common()        # Print list with most common to least common
[('abc', 3), ('jkl', 1), ('def-ghi', 1)]

0 讨论(0)