I need to search a pretty large text file for a particular string. Its a build log with about 5000 lines of text. Whats the best way to go about doing that? Using regex sho
This is entirely inspired by laurasia's answer above, but it refines the structure.
It also adds some checks:
0
when searching an empty file for the empty string. In laurasia's answer, this is an edge case that will return -1
.In practice, the goal string should be much smaller than the buffer for efficiency, and there are more efficient methods of searching if the size of the goal string is very close to the size of the buffer.
def fnd(fname, goal, start=0, bsize=4096):
if bsize < len(goal):
raise ValueError("The buffer size must be larger than the string being searched for.")
with open(fname, 'rb') as f:
if start > 0:
f.seek(start)
overlap = len(goal) - 1
while True:
buffer = f.read(bsize)
pos = buffer.find(goal)
if pos >= 0:
return f.tell() - len(buffer) + pos
if not buffer:
return -1
f.seek(f.tell() - overlap)