How to read file in reverse order in python3.2 without reading the whole file to memory? [duplicate]

谁说我不能喝 提交于 2019-12-22 08:34:11

问题


I am parsing log files in size of 1 to 10GB using python3.2, need to search for line with specific regex (some kind of timestamp), and I want to find the last occurance.

I have tried to use:

for line in reversed(list(open("filename")))

which resulted in very bad performance (in the good cases) and MemoryError in the bad cases.

In thread: Read a file in reverse order using python i did not find any good answer.

I have found the following solution: python head, tail and backward read by lines of a text file very promising, however it does not work for python3.2 for error:

NameError: name 'file' is not defined

I had later tried to replace File(file) with File(TextIOWrapper) as this is the object builtin function open() returns, however that had resulted in several more errors (i can elaborate if someone suggest this is the right way:))


回答1:


This is a function that does what you're looking for

def reverse_lines(filename, BUFSIZE=4096):
    f = open(filename, "rb")
    f.seek(0, 2)
    p = f.tell()
    remainder = ""
    while True:
        sz = min(BUFSIZE, p)
        p -= sz
        f.seek(p)
        buf = f.read(sz) + remainder
        if '\n' not in buf:
            remainder = buf
        else:
            i = buf.index('\n')
            for L in buf[i+1:].split("\n")[::-1]:
                yield L
            remainder = buf[:i]
        if p == 0:
            break
    yield remainder

it works by reading a buffer from the end of the file (by default 4kb) and generating all the lines in it in reverse. It then moves back by 4k and does the same until the beginning of the file. The code may need to keep more than 4k in memory in case there are no linefeed in the section being processed (very long lines).

You can use the code as

for L in reverse_lines("my_big_file"):
   ... process L ...



回答2:


If you don't want to read the whole file you can always use seek. Here is a demo:

 $ cat words.txt 
foo
bar
baz
[6] oz123b@debian:~ $ ls -l words.txt 
-rw-r--r-- 1 oz123 oz123 12 Mar  9 19:38 words.txt

The file size is 12 bytes. You can skip to the last entry by moving the cursor 8 bites forward:

In [3]: w=open("words.txt")
In [4]: w.seek(8)
In [5]: w.readline()
Out[5]: 'baz\n'

To complete my answer, here is how you print these lines in reverse:

 w=open('words.txt')

In [6]: for s in [8, 4, 0]:
   ...:     _= w.seek(s)
   ...:     print(w.readline().strip())
   ...:     
baz
bar
foo

You will have to explore you file's data structure and the size of each line. Mine was quite simple, because it was meant to demonstrate the principle.



来源:https://stackoverflow.com/questions/22286332/how-to-read-file-in-reverse-order-in-python3-2-without-reading-the-whole-file-to

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!