Cheap way to search a large text file for a string

后端 未结 9 1108
隐瞒了意图╮
隐瞒了意图╮ 2020-11-27 04:15

I need to search a pretty large text file for a particular string. Its a build log with about 5000 lines of text. Whats the best way to go about doing that? Using regex sho

9条回答
  •  广开言路
    2020-11-27 05:03

    The following function works for textfiles and binary files (returns only position in byte-count though), it does have the benefit to find strings even if they would overlap a line or buffer and would not be found when searching line- or buffer-wise.

    def fnd(fname, s, start=0):
        with open(fname, 'rb') as f:
            fsize = os.path.getsize(fname)
            bsize = 4096
            buffer = None
            if start > 0:
                f.seek(start)
            overlap = len(s) - 1
            while True:
                if (f.tell() >= overlap and f.tell() < fsize):
                    f.seek(f.tell() - overlap)
                buffer = f.read(bsize)
                if buffer:
                    pos = buffer.find(s)
                    if pos >= 0:
                        return f.tell() - (len(buffer) - pos)
                else:
                    return -1
    

    The idea behind this is:

    • seek to a start position in file
    • read from file to buffer (the search strings has to be smaller than the buffer size) but if not at the beginning, drop back the - 1 bytes, to catch the string if started at the end of the last read buffer and continued on the next one.
    • return position or -1 if not found

    I used something like this to find signatures of files inside larger ISO9660 files, which was quite fast and did not use much memory, you can also use a larger buffer to speed things up.

提交回复
热议问题