Counting specific characters in a file (Python)

问题

I'd like to count specific things from a file, i.e. how many times "--undefined--" appears. Here is a piece of the file's content:

"jo:ns  76.434
pRE     75.417
zi:     75.178
dEnt    --undefined--
ba      --undefined--

I tried to use something like this. But it won't work:

with open("v3.txt", 'r') as infile:
    data = infile.readlines().decode("UTF-8")

    count = 0
    for i in data:
        if i.endswith("--undefined--"):
            count += 1
    print count

Do I have to implement, say, dictionary of tuples to tackle this or there is an easier solution for that?

EDIT:

The word in question appears only once in a line.

回答1:

readlines() returns the list of lines, but they are not stripped (ie. they contain the newline character). Either strip them first:

data = [line.strip() for line in data]

or check for --undefined--\n:

if line.endswith("--undefined--\n"):

Alternatively, consider string's .count() method:

file_contents.count("--undefined--")

回答2:

you can read all the data in one string and split the string in a list, and count occurrences of the substring in that list.

with open('afile.txt', 'r') as myfile:
    data=myfile.read().replace('\n', ' ')

data.split(' ').count("--undefined--")

or directly from the string :

data.count("--undefined--")

回答3:

Or don't limit yourself to .endswith(), use the in operator.

data = ''
count = 0

with open('v3.txt', 'r') as infile:
    data = infile.readlines()
print(data)

for line in data:
    if '--undefined--' in line:
        count += 1

count

回答4:

When reading a file line by line, each line ends with the newline character:

>>> with open("blookcore/models.py") as f:
...    lines = f.readlines()
... 
>>> lines[0]
'# -*- coding: utf-8 -*-\n'
>>>

so your endswith() test just can't work - you have to strip the line first:

if i.strip().endswith("--undefined--"):
    count += 1

Now reading a whole file in memory is more often than not a bad idea - even if the file fits in memory, it still eats fresources for no good reason. Python's file objects are iterable, so you can just loop over your file. And finally, you can specify which encoding should be used when opening the file (instead of decoding manually) using the codecs module (python 2) or directly (python3):

# py3
with open("your/file.text", encoding="utf-8") as f:

# py2:
import codecs
with codecs.open("your/file.text", encoding="utf-8") as f:

then just use the builtin sum and a generator expression:

result = sum(line.strip().endswith("whatever") for line in f)

this relies on the fact that booleans are integers with values 0 (False) and 1 (True).

回答5:

Quoting Raymond Hettinger, "There must be a better way":

from collections import Counter

counter = Counter()
words = ('--undefined--', 'otherword', 'onemore')

with open("v3.txt", 'r') as f:
    lines = f.readlines()
    for line in lines:
        for word in words:
            if word in line:
                counter.update((word,))  # note the single element tuple

print counter

来源：https://stackoverflow.com/questions/48885930/counting-specific-characters-in-a-file-python

标签

python

python-2.7