Read blocks from a file object until x bytes from the end

寵の児 提交于 2021-01-07 02:46:42

问题


I need to read chunks of 64KB in loop, and process them, but stop at the end of file minus 16 bytes: the last 16 bytes are a tag metadata.

The file might be super large, so I can't read it all in RAM.

All the solutions I find are a bit clumsy and/or unpythonic.

with open('myfile', 'rb') as f:
    while True:
        block = f.read(65536)
        if not block:
            break
        process_block(block)

If 16 <= len(block) < 65536, it's easy: it's the last block ever. So useful_data = block[:-16] and tag = block[-16:]

If len(block) == 65536, it could mean three things: that the full block is useful data. Or that this 64KB block is in fact the last block, so useful_data = block[:-16] and tag = block[-16:]. Or that this 64KB block is followed by another block of only a few bytes (let's say 3 bytes), so in this case: useful_data = block[:-13] and tag = block[-13:] + last_block[:3].

How to deal with this problem in a nicer way than distinguishing all these cases?

Note:

  • the solution should work for a file opened with open(...), but also for a io.BytesIO() object, or for a distant SFTP opened file (with pysftp).

  • I was thinking about getting the file object size, with

    f.seek(0,2)
    length = f.tell()
    f.seek(0)
    

    Then after each

    block = f.read(65536)
    

    we can know if we are far from the end with length - f.tell(), but again the full solution does not look very elegant.


回答1:


you can just read in every iteration min(65536, L-f.tell()-16)

Something like this:

from pathlib import Path

L = Path('myfile').stat().st_size

with open('myfile', 'rb') as f:
    while True:    
        to_read_length = min(65536, L-f.tell()-16)
        block = f.read(to_read_length)
        process_block(block)
        if f.tell() == L-16
            break

Did not ran this, but hope you get the gist of it.




回答2:


The following method relies only on the fact that the f.read() method returns an empty bytes object upon end of stream (EOS). It thus could be adopted for sockets simply by replacing f.read() with s.recv().

def read_all_but_last16(f):
    rand = random.Random()  #  just for testing
    buf = b''
    while True:
        bytes_read = f.read(rand.randint(1, 40))  # just for testing
        # bytes_read = f.read(65536)
        buf += bytes_read
        if not bytes_read:
            break
        process_block(buf[:-16])
        buf = buf[-16:]
    verify(buf[-16:])

It works by always leaving 16 bytes at the end of buf until EOS, then finally processing the last 16. Note that if there aren't at least 17 bytes in buf then buf[:-16] returns the empty bytes object.



来源:https://stackoverflow.com/questions/64959048/read-blocks-from-a-file-object-until-x-bytes-from-the-end

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!