Python EOF for multi byte requests of file.read()

后端 未结 2 1298
天命终不由人
天命终不由人 2020-12-16 19:23

The Python docs on file.read() state that An empty string is returned when EOF is encountered immediately. The documentation further states:

2条回答
  •  离开以前
    2020-12-16 20:03

    You are not thinking with your snake skin on... Python is not C.

    First, a review:

    • st=f.read() reads to EOF, or if opened as a binary, to the last byte;
    • st=f.read(n) attempts to reads n bytes and in no case more than n bytes;
    • st=f.readline() reads a line at a time, the line ends with '\n' or EOF;
    • st=f.readlines() uses readline() to read all the lines in a file and returns a list of the lines.

    If a file read method is at EOF, it returns ''. The same type of EOF test is used in the other 'file like" methods like StringIO, socket.makefile, etc. A return of less than n bytes from f.read(n) is most assuredly NOT a dispositive test for EOF! While that code may work 99.99% of the time, it is the times it does not work that would be very frustrating to find. Plus, it is bad Python form. The only use for n in this case is to put an upper limit on the size of the return.

    What are some of the reasons the Python file-like methods returns less than n bytes?

    1. EOF is certainly a common reason;
    2. A network socket may timeout on read yet remain open;
    3. Exactly n bytes may cause a break between logical multi-byte characters (such as \r\n in text mode and, I think, a multi-byte character in Unicode) or some underlying data structure not known to you;
    4. The file is in non-blocking mode and another process begins to access the file;
    5. Temporary non-access to the file;
    6. An underlying error condition, potentially temporary, on the file, disc, network, etc.
    7. The program received a signal, but the signal handler ignored it.

    I would rewrite your code in this manner:

    with open(filename,'rb') as f:
        while True:
            s=f.read(max_size)
            if not s: break
    
            # process the data in s...
    

    Or, write a generator:

    def blocks(infile, bufsize=1024):
        while True:
            try:
                data=infile.read(bufsize)
                if data:
                    yield data
                else:
                    break
            except IOError as (errno, strerror):
                print "I/O error({0}): {1}".format(errno, strerror)
                break
    
    f=open('somefile','rb')
    
    for block in blocks(f,2**16):
        # process a block that COULD be up to 65,536 bytes long
    

提交回复
热议问题