How to create a frequency list of every word in a file?

前端 未结 11 966
心在旅途
心在旅途 2020-12-04 09:59

I have a file like this:

This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.

I

11条回答
  •  Happy的楠姐
    2020-12-04 10:55

    Let's do it in Python 3!

    """Counts the frequency of each word in the given text; words are defined as
    entities separated by whitespaces; punctuations and other symbols are ignored;
    case-insensitive; input can be passed through stdin or through a file specified
    as an argument; prints highest frequency words first"""
    
    # Case-insensitive
    # Ignore punctuations `~!@#$%^&*()_-+={}[]\|:;"'<>,.?/
    
    import sys
    
    # Find if input is being given through stdin or from a file
    lines = None
    if len(sys.argv) == 1:
        lines = sys.stdin
    else:
        lines = open(sys.argv[1])
    
    D = {}
    for line in lines:
        for word in line.split():
            word = ''.join(list(filter(
                lambda ch: ch not in "`~!@#$%^&*()_-+={}[]\\|:;\"'<>,.?/",
                word)))
            word = word.lower()
            if word in D:
                D[word] += 1
            else:
                D[word] = 1
    
    for word in sorted(D, key=D.get, reverse=True):
        print(word + ' ' + str(D[word]))
    

    Let's name this script "frequency.py" and add a line to "~/.bash_aliases":

    alias freq="python3 /path/to/frequency.py"
    

    Now to find the frequency words in your file "content.txt", you do:

    freq content.txt
    

    You can also pipe output to it:

    cat content.txt | freq
    

    And even analyze text from multiple files:

    cat content.txt story.txt article.txt | freq
    

    If you are using Python 2, just replace

    • ''.join(list(filter(args...))) with filter(args...)
    • python3 with python
    • print(whatever) with print whatever

提交回复
热议问题