Word frequency analysis in Python returning letter frequency

后端 未结 3 696
野的像风
野的像风 2021-01-27 01:25

Following examples on other Stackoverflow posts related to word frequency analysis in Python, my program is returning letter frequency analysis and not actually the word.

<
3条回答
  •  忘掉有多难
    2021-01-27 01:38

    You can use a regex to find all the word (vs character by character that you are getting now):

    import re
    
    ...
    
    commonWords = Counter(m.group(1) for m in re.finditer(r'\b(\w+)\b', contents))
    

    You can use contents.split() to split the text on whitespace but that will not separate words from punctuation. You will also have a separate count for word and word, and word. etc which using a regex will fix.

提交回复
热议问题