I have a file in which lines are separated using a delimeter say .. I want to read this file line by line, where lines should be based on presence of .
Here is a more efficient answer, using FileIO and bytearray that I used for parsing a PDF file -
import io
import re
# the end-of-line chars, separated by a `|` (logical OR)
EOL_REGEX = b'\r\n|\r|\n'
# the end-of-file char
EOF = b'%%EOF'
def readlines(fio):
buf = bytearray(4096)
while True:
fio.readinto(buf)
try:
yield buf[: buf.index(EOF)]
except ValueError:
pass
else:
break
for line in re.split(EOL_REGEX, buf):
yield line
with io.FileIO("test.pdf") as fio:
for line in readlines(fio):
...
The above example also handles a custom EOF. If you don't want that, use this:
import io
import os
import re
# the end-of-line chars, separated by a `|` (logical OR)
EOL_REGEX = b'\r\n|\r|\n'
def readlines(fio, size):
buf = bytearray(4096)
while True:
if fio.tell() >= size:
break
fio.readinto(buf)
for line in re.split(EOL_REGEX, buf):
yield line
size = os.path.getsize("test.pdf")
with io.FileIO("test.pdf") as fio:
for line in readlines(fio, size):
...