Reading a file with a specified delimiter for newline

后端 未结 3 1862
遇见更好的自我
遇见更好的自我 2020-12-01 12:43

I have a file in which lines are separated using a delimeter say .. I want to read this file line by line, where lines should be based on presence of .

3条回答
  •  鱼传尺愫
    2020-12-01 13:16

    Here is a more efficient answer, using FileIO and bytearray that I used for parsing a PDF file -

    import io
    import re
    
    
    # the end-of-line chars, separated by a `|` (logical OR)
    EOL_REGEX = b'\r\n|\r|\n'  
    
    # the end-of-file char
    EOF = b'%%EOF'
    
    
    
    def readlines(fio):
        buf = bytearray(4096)
        while True:
            fio.readinto(buf)
            try:
                yield buf[: buf.index(EOF)]
            except ValueError:
                pass
            else:
                break
            for line in re.split(EOL_REGEX, buf):
                yield line
    
    
    with io.FileIO("test.pdf") as fio:
        for line in readlines(fio):
            ...
    

    The above example also handles a custom EOF. If you don't want that, use this:

    import io
    import os
    import re
    
    
    # the end-of-line chars, separated by a `|` (logical OR)
    EOL_REGEX = b'\r\n|\r|\n'  
    
    
    def readlines(fio, size):
        buf = bytearray(4096)
        while True:
            if fio.tell() >= size:
                break               
            fio.readinto(buf)            
            for line in re.split(EOL_REGEX, buf):
                yield line
    
    size = os.path.getsize("test.pdf")
    with io.FileIO("test.pdf") as fio:
        for line in readlines(fio, size):
             ...
    

提交回复
热议问题