“OSError: [Errno 22] Invalid argument” when read()ing a huge file

前端 未结 2 1591
故里飘歌
故里飘歌 2020-12-18 20:37

I\'m trying to write a small script that prints the checksum of a file (using some code from https://gist.github.com/Zireael-N/ed36997fd1a967d78cb2):

import          


        
相关标签:
2条回答
  • 2020-12-18 21:03

    There have been several issues over the history of Python (most fixed in recent versions) reading more than 2-4 GB at once from a file handle (an unfixable version of the problem also occurs on 32 bit builds of Python, where they simply lack the virtual address space to allocate the buffer; not I/O related, but seen most frequently slurping large files). A workaround available for hashing is to update the hash in fixed size chunks (which is a good idea anyway, since counting on RAM being greater than file size is a poor idea). The most straightforward approach is to change your code to:

    with open(file, 'rb') as f:
        hasher = hashlib.sha256()  # Make empty hasher to update piecemeal
        while True:
            block = f.read(64 * (1 << 20)) # Read 64 MB at a time; big, but not memory busting
            if not block:  # Reached EOF
                break
            hasher.update(block)  # Update with new block
    print('SHA256 of file is %s' % hasher.hexdigest())  # Finalize to compute digest
    

    If you're feeling fancy, you can "simplify" the loop using two-arg iter and some functools magic, replacing the whole of the while loop with:

    for block in iter(functools.partial(f.read, 64 * (1 << 20)), b''):
        hasher.update(block)
    

    Or on Python 3.8+, with the walrus operator, := it's simpler without the need for imports or unreadable code:

    while block := f.read(64 * (1 << 20)):  # Assigns and tests result in conditional!
        hasher.update(block)
    
    0 讨论(0)
  • 2020-12-18 21:09

    Wow this can be much simpler. Just read the file line by line:

    with open('big-file.txt') as f:
      for i in f:
        print(i)
    
    0 讨论(0)
提交回复
热议问题