How to protect myself from a gzip or bzip2 bomb?

前端 未结 5 1241
眼角桃花
眼角桃花 2020-12-15 06:53

This is related to the question about zip bombs, but having gzip or bzip2 compression in mind, e.g. a web service accepting .tar.gz files.

Python provid

相关标签:
5条回答
  • 2020-12-15 07:09

    I also need to handle zip bombs in uploaded zipfiles.

    I do this by creating a fixed size tmpfs, and unzipping to that. If the extracted data is too large then the tmpfs will run out of space and give an error.

    Here is the linux commands to create a 200M tmpfs to unzip to.

    sudo mkdir -p /mnt/ziptmpfs
    echo 'tmpfs   /mnt/ziptmpfs         tmpfs   rw,nodev,nosuid,size=200M          0  0' | sudo tee -a /etc/fstab
    
    0 讨论(0)
  • 2020-12-15 07:11

    This will determine the uncompressed size of the gzip stream, while using limited memory:

    #!/usr/bin/python
    import sys
    import zlib
    f = open(sys.argv[1], "rb")
    z = zlib.decompressobj(15+16)
    total = 0
    while True:
        buf = z.unconsumed_tail
        if buf == "":
            buf = f.read(1024)
            if buf == "":
                break
        got = z.decompress(buf, 4096)
        if got == "":
            break
        total += len(got)
    print total
    if z.unused_data != "" or f.read(1024) != "":
        print "warning: more input after end of gzip stream"
    

    It will return a slight overestimate of the space required for all of the files in the tar file in when extracted. The length includes those files, as well as the tar directory information.

    The gzip.py code does not control the amount of data decompressed, except by virtue of the size of the input data. In gzip.py, it reads 1024 compressed bytes at a time. So you can use gzip.py if you're ok with up to about 1056768 bytes of memory usage for the uncompressed data (1032 * 1024, where 1032:1 is the maximum compression ratio of deflate). The solution here uses zlib.decompress with the second argument, which limits the amount of uncompressed data. gzip.py does not.

    This will accurately determine the total size of the extracted tar entries by decoding the tar format:

    #!/usr/bin/python
    
    import sys
    import zlib
    
    def decompn(f, z, n):
        """Return n uncompressed bytes, or fewer if at the end of the compressed
           stream.  This only decompresses as much as necessary, in order to
           avoid excessive memory usage for highly compressed input.
        """
        blk = ""
        while len(blk) < n:
            buf = z.unconsumed_tail
            if buf == "":
                buf = f.read(1024)
            got = z.decompress(buf, n - len(blk))
            blk += got
            if got == "":
                break
        return blk
    
    f = open(sys.argv[1], "rb")
    z = zlib.decompressobj(15+16)
    total = 0
    left = 0
    while True:
        blk = decompn(f, z, 512)
        if len(blk) < 512:
            break
        if left == 0:
            if blk == "\0"*512:
                continue
            if blk[156] in ["1", "2", "3", "4", "5", "6"]:
                continue
            if blk[124] == 0x80:
                size = 0
                for i in range(125, 136):
                    size <<= 8
                    size += blk[i]
            else:
                size = int(blk[124:136].split()[0].split("\0")[0], 8)
            if blk[156] not in ["x", "g", "X", "L", "K"]:
                    total += size
            left = (size + 511) // 512
        else:
            left -= 1
    print total
    if blk != "":
        print "warning: partial final block"
    if left != 0:
        print "warning: tar file ended in the middle of an entry"
    if z.unused_data != "" or f.read(1024) != "":
        print "warning: more input after end of gzip stream"
    

    You could use a variant of this to scan the tar file for bombs. This has the advantage of finding a large size in the header information before you even have to decompress that data.

    As for .tar.bz2 archives, the Python bz2 library (at least as of 3.3) is unavoidably unsafe for bz2 bombs consuming too much memory. The bz2.decompress function does not offer a second argument like zlib.decompress does. This is made even worse by the fact that the bz2 format has a much, much higher maximum compression ratio than zlib due to run-length coding. bzip2 compresses 1 GB of zeros to 722 bytes. So you cannot meter the output of bz2.decompress by metering the input as can be done with zlib.decompress even without the second argument. The lack of a limit on the decompressed output size is a fundamental flaw in the Python interface.

    I looked in the _bz2module.c in 3.3 to see if there is an undocumented way to use it to avoid this problem. There is no way around it. The decompress function in there just keeps growing the result buffer until it can decompress all of the provided input. _bz2module.c needs to be fixed.

    0 讨论(0)
  • 2020-12-15 07:13

    I guess the answer is: There is no easy, readymade solution. Here is what I use now:

    class SafeUncompressor(object):
        """Small proxy class that enables external file object
        support for uncompressed, bzip2 and gzip files. Works transparently, and
        supports a maximum size to avoid zipbombs.
        """
        blocksize = 16 * 1024
    
        class FileTooLarge(Exception):
            pass
    
        def __init__(self, fileobj, maxsize=10*1024*1024):
            self.fileobj = fileobj
            self.name = getattr(self.fileobj, "name", None)
            self.maxsize = maxsize
            self.init()
    
        def init(self):
            import bz2
            import gzip
            self.pos = 0
            self.fileobj.seek(0)
            self.buf = ""
            self.format = "plain"
    
            magic = self.fileobj.read(2)
            if magic == '\037\213':
                self.format = "gzip"
                self.gzipobj = gzip.GzipFile(fileobj = self.fileobj, mode = 'r')
            elif magic == 'BZ':
                raise IOError, "bzip2 support in SafeUncompressor disabled, as self.bz2obj.decompress is not safe"
                self.format = "bz2"
                self.bz2obj = bz2.BZ2Decompressor()
            self.fileobj.seek(0)
    
    
        def read(self, size):
            b = [self.buf]
            x = len(self.buf)
            while x < size:
                if self.format == 'gzip':
                    data = self.gzipobj.read(self.blocksize)
                    if not data:
                        break
                elif self.format == 'bz2':
                    raw = self.fileobj.read(self.blocksize)
                    if not raw:
                        break
                    # this can already bomb here, to some extend.
                    # so disable bzip support until resolved.
                    # Also monitor http://stackoverflow.com/questions/13622706/how-to-protect-myself-from-a-gzip-or-bzip2-bomb for ideas
                    data = self.bz2obj.decompress(raw)
                else:
                    data = self.fileobj.read(self.blocksize)
                    if not data:
                        break
                b.append(data)
                x += len(data)
    
                if self.pos + x > self.maxsize:
                    self.buf = ""
                    self.pos = 0
                    raise SafeUncompressor.FileTooLarge, "Compressed file too large"
            self.buf = "".join(b)
    
            buf = self.buf[:size]
            self.buf = self.buf[size:]
            self.pos += len(buf)
            return buf
    
        def seek(self, pos, whence=0):
            if whence != 0:
                raise IOError, "SafeUncompressor only supports whence=0"
            if pos < self.pos:
                self.init()
            self.read(pos - self.pos)
    
        def tell(self):
            return self.pos
    

    It does not work well for bzip2, so that part of the code is disabled. The reason is that bz2.BZ2Decompressor.decompress can already produce an unwanted large chunk of data.

    0 讨论(0)
  • 2020-12-15 07:27

    You could use resource module to limit resources available to your process and its children.

    If you need to decompress in memory then you could set resource.RLIMIT_AS (or RLIMIT_DATA, RLIMIT_STACK) e.g., using a context manager to automatically restore it to a previous value:

    import contextlib
    import resource
    
    @contextlib.contextmanager
    def limit(limit, type=resource.RLIMIT_AS):
        soft_limit, hard_limit = resource.getrlimit(type)
        resource.setrlimit(type, (limit, hard_limit)) # set soft limit
        try:
            yield
        finally:
            resource.setrlimit(type, (soft_limit, hard_limit)) # restore
    
    with limit(1 << 30): # 1GB 
        # do the thing that might try to consume all memory
    

    If the limit is reached; MemoryError is raised.

    0 讨论(0)
  • 2020-12-15 07:31

    If you develop for linux, you can run decompression in separate process and use ulimit to limit the memory usage.

    import subprocess
    subprocess.Popen("ulimit -v %d; ./decompression_script.py %s" % (LIMIT, FILE))
    

    Keep in mind that decompression_script.py should decompress the whole file in memory, before writing to disk.

    0 讨论(0)
提交回复
热议问题