I want to be able to run a regular expression on an entire file, but I\'d like to be able to not have to read the whole file into memory at once as I may be working with rat
This is one way:
import re
REGEX = '\d+'
with open('/tmp/workfile', 'r') as f:
for line in f:
print re.match(REGEX,line)
Another approach which comes to my mind is to use read(size) and file.seek(offset) method, which will read a portion of the file size at a time.
import re
REGEX = '\d+'
with open('/tmp/workfile', 'r') as f:
filesize = f.size()
part = filesize / 10 # a suitable size that you can determine ahead or in the prog.
position = 0
while position <= filesize:
content = f.read(part)
print re.match(REGEX,content)
position = position + part
f.seek(position)
You can also combine these two there you can create generator that would return contents a certain bytes at the time and iterate through that content to check your regex. This IMO would be a good approach.