Efficiently calculate word frequency in a string

后端 未结 3 1148
遥遥无期
遥遥无期 2020-12-10 12:19

I am parsing a long string of text and calculating the number of times each word occurs in Python. I have a function that works but I am looking for advice on whether there

相关标签:
3条回答
  • 2020-12-10 12:45

    Use collections.Counter:

    >>> from collections import Counter
    >>> test = 'abc def abc def zzz zzz'
    >>> Counter(test.split()).most_common()
    [('abc', 2), ('zzz', 2), ('def', 2)]
    
    0 讨论(0)
  • 2020-12-10 12:48

    You can also use NLTK (Natural Language ToolKit). It provide very nice libraries for studying the processing the texts. for this example you can use:

    from nltk import FreqDist
    
    text = "aa bb cc aa bb"
    fdist1 = FreqDist(text)
    
    # show most 10 frequent word in the text
    print fdist1.most_common(10)
    

    the result will be:

    [('aa', 2), ('bb', 2), ('cc', 1)]
    
    0 讨论(0)
  • 2020-12-10 12:53
    >>>> test = """abc def-ghi jkl abc
    abc"""
    >>> from collections import Counter
    >>> words = Counter()
    >>> words.update(test.split()) # Update counter with words
    >>> words.most_common()        # Print list with most common to least common
    [('abc', 3), ('jkl', 1), ('def-ghi', 1)]
    
    0 讨论(0)
提交回复
热议问题