How to create a frequency list of every word in a file?

前端未结

关注

 11  966

心在旅途 2020-12-04 09:59

I have a file like this:

This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.

11条回答

Happy的楠姐 (楼主)

2020-12-04 10:55

Let's do it in Python 3!

"""Counts the frequency of each word in the given text; words are defined as
entities separated by whitespaces; punctuations and other symbols are ignored;
case-insensitive; input can be passed through stdin or through a file specified
as an argument; prints highest frequency words first"""

# Case-insensitive
# Ignore punctuations `~!@#$%^&*()_-+={}[]\|:;"'<>,.?/

import sys

# Find if input is being given through stdin or from a file
lines = None
if len(sys.argv) == 1:
    lines = sys.stdin
else:
    lines = open(sys.argv[1])

D = {}
for line in lines:
    for word in line.split():
        word = ''.join(list(filter(
            lambda ch: ch not in "`~!@#$%^&*()_-+={}[]\\|:;\"'<>,.?/",
            word)))
        word = word.lower()
        if word in D:
            D[word] += 1
        else:
            D[word] = 1

for word in sorted(D, key=D.get, reverse=True):
    print(word + ' ' + str(D[word]))

Let's name this script "frequency.py" and add a line to "~/.bash_aliases":

alias freq="python3 /path/to/frequency.py"

Now to find the frequency words in your file "content.txt", you do:

freq content.txt

You can also pipe output to it:

cat content.txt | freq

And even analyze text from multiple files:

cat content.txt story.txt article.txt | freq

If you are using Python 2, just replace

''.join(list(filter(args...))) with filter(args...)
python3 with python
print(whatever) with print whatever

0 讨论(0)

查看其它11个回答