Counting specific characters in a file (Python)

ぃ、小莉子 提交于 2021-02-08 07:39:07

问题


I'd like to count specific things from a file, i.e. how many times "--undefined--" appears. Here is a piece of the file's content:

"jo:ns  76.434
pRE     75.417
zi:     75.178
dEnt    --undefined--
ba      --undefined--

I tried to use something like this. But it won't work:

with open("v3.txt", 'r') as infile:
    data = infile.readlines().decode("UTF-8")

    count = 0
    for i in data:
        if i.endswith("--undefined--"):
            count += 1
    print count

Do I have to implement, say, dictionary of tuples to tackle this or there is an easier solution for that?

EDIT:

The word in question appears only once in a line.


回答1:


readlines() returns the list of lines, but they are not stripped (ie. they contain the newline character). Either strip them first:

data = [line.strip() for line in data]

or check for --undefined--\n:

if line.endswith("--undefined--\n"):

Alternatively, consider string's .count() method:

file_contents.count("--undefined--")



回答2:


you can read all the data in one string and split the string in a list, and count occurrences of the substring in that list.

with open('afile.txt', 'r') as myfile:
    data=myfile.read().replace('\n', ' ')

data.split(' ').count("--undefined--")

or directly from the string :

data.count("--undefined--")



回答3:


Or don't limit yourself to .endswith(), use the in operator.

data = ''
count = 0

with open('v3.txt', 'r') as infile:
    data = infile.readlines()
print(data)

for line in data:
    if '--undefined--' in line:
        count += 1

count



回答4:


When reading a file line by line, each line ends with the newline character:

>>> with open("blookcore/models.py") as f:
...    lines = f.readlines()
... 
>>> lines[0]
'# -*- coding: utf-8 -*-\n'
>>> 

so your endswith() test just can't work - you have to strip the line first:

if i.strip().endswith("--undefined--"):
    count += 1

Now reading a whole file in memory is more often than not a bad idea - even if the file fits in memory, it still eats fresources for no good reason. Python's file objects are iterable, so you can just loop over your file. And finally, you can specify which encoding should be used when opening the file (instead of decoding manually) using the codecs module (python 2) or directly (python3):

# py3
with open("your/file.text", encoding="utf-8") as f:

# py2:
import codecs
with codecs.open("your/file.text", encoding="utf-8") as f:

then just use the builtin sum and a generator expression:

result = sum(line.strip().endswith("whatever") for line in f)

this relies on the fact that booleans are integers with values 0 (False) and 1 (True).




回答5:


Quoting Raymond Hettinger, "There must be a better way":

from collections import Counter

counter = Counter()
words = ('--undefined--', 'otherword', 'onemore')

with open("v3.txt", 'r') as f:
    lines = f.readlines()
    for line in lines:
        for word in words:
            if word in line:
                counter.update((word,))  # note the single element tuple

print counter


来源:https://stackoverflow.com/questions/48885930/counting-specific-characters-in-a-file-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!