How to append a file to a tar file use python tarfile module?

前端 未结 2 1957
猫巷女王i
猫巷女王i 2021-01-04 04:31

I want to append a file to the tar file. For example, the files in test.tar.gz are a.png, b.png, c.png. I have a new png file named a.png

相关标签:
2条回答
  • 2021-01-04 04:55

    David Dale asks:

    Update. From the documentation, it follows that gz files cannot be open in a mode. If so, what is the best way to add or update files in an existing archive?

    Short answer:

    1. decompress / unpack archive
    2. replace / add file(s)
    3. repack / compress archive

    I tried to do it in memory using gzip's and tarfile's and file/stream interfaces but did not manage to get it running - the tarball has to be rewritten anyway, since replacing a file is apparently not possible. So it's better to just unpack the whole archive.

    Wikipedia on tar, gzip.

    The script, if run directly, also tries to generates the test images "a.png, b.png, c.png, new.png" (requiring Pillow) and the initial archive "test.tar.gz" if they don't exist. It then decompresses the archive into a temporary directory, overwrites "a.png" with the contents of "new.png", and packs all files, overwriting the original archive. Here are the individual files:

    a.png b.png c.png
    new.png

    Of course the script's functions can also be run sequentially in interactive mode, in order to have a chance to look at the files. Assuming the script's filename is "t.py":

    >>> from t import *
    >>> make_images()
    >>> make_archive()
    >>> replace_file()
    Workaround
    

    Here we go (the essential part is in replace_file()):

    #!python3
    #coding=utf-8
    """
    Replace a file in a .tar.gz archive via temporary files
       https://stackoverflow.com/questions/28361665/how-to-append-a-file-to-a-tar-file-use-python-tarfile-module
    """
    
    import sys        #
    import pathlib    # https://docs.python.org/3/library/pathlib.html
    import tempfile   # https://docs.python.org/3/library/tempfile.html
    import tarfile    # https://docs.python.org/3/library/tarfile.html
    #import gzip      # https://docs.python.org/3/library/gzip.html
    
    gfn = "test.tar.gz"
    iext = ".png"
    
    replace = "a"+iext
    replacement = "new"+iext
    
    def make_images():
        """Generate 4 test images with Pillow (PIL fork, http://pillow.readthedocs.io/)"""
        try:
            from PIL import Image, ImageDraw, ImageFont
            font = ImageFont.truetype("arial.ttf", 50)
    
            for k,v in {"a":"red", "b":"green", "c":"blue", "new":"orange"}.items():
                img = Image.new('RGB', (100, 100), color=v)
                d = ImageDraw.Draw(img)
                d.text((0, 0), k, fill=(0, 0, 0), font=font)
                img.save(k+iext)
        except Exception as e:
            print(e, file=sys.stderr)
            print("Could not create image files", file=sys.stderr)
            print("(pip install pillow)", file=sys.stderr)
    
    def make_archive():
        """Create gzip compressed tar file with the three images"""
        try:
            t = tarfile.open(gfn, 'w:gz')
            for f in 'abc':
                t.add(f+iext)
            t.close()
        except Exception as e:
            print(e, file=sys.stderr)
            print("Could not create archive", file=sys.stderr)
    
    def make_files():
        """Generate sample images and archive"""
        mi = False
        for f in ['a','b','c','new']:
            p = pathlib.Path(f+iext)
            if not p.is_file():
                mi = True
        if mi:
            make_images()
        if not pathlib.Path(gfn).is_file():
            make_archive()
    
    def add_file_not():
        """Might even corrupt the existing file?"""
        print("Not possible: tarfile with \"a:gz\" - failing now:", file=sys.stderr)
        try:
            a = tarfile.open(gfn, 'a:gz')  # not possible!
            a.add(replacement, arcname=replace)
            a.close()
        except Exception as e:
            print(e, file=sys.stderr)
    
    def replace_file():
        """Extract archive to temporary directory, replace file, replace archive """
        print("Workaround", file=sys.stderr)
    
        # tempdir
        with tempfile.TemporaryDirectory() as td:
            # dirname to Path
            tdp = pathlib.Path(td)
    
            # extract archive to temporry directory
            with tarfile.open(gfn) as r:
                r.extractall(td)
    
            # print(list(tdp.iterdir()), file=sys.stderr)
    
            # replace target in temporary directory
            (tdp/replace).write_bytes( pathlib.Path(replacement).read_bytes() )
    
            # replace archive, from all files in tempdir
            with tarfile.open(gfn, "w:gz") as w:
                for f in tdp.iterdir():
                    w.add(f, arcname=f.name)
        #done
    
    def test():
        """as the name suggests, this just runs some tests ;-)"""
        make_files()
        #add_file_not()
        replace_file()
    
    if __name__ == "__main__":
        test()
    

    If you want to add files instead of replacing them, obviously just omit the line that replaces the temporary file, and copy the additional files into the temp directory. Make sure that pathlib.Path.iterdir then also "sees" the new files to be added to the new archive.


    I've put this in a somewhat more useful function:

    def targz_add(targz=None, src=None, dst=None, replace=False):
        """Add <src> file(s) to <targz> file, optionally replacing existing file(s).
        Uses temporary directory to modify archive contents.
        TODO: complete error handling...
        """
        import sys, pathlib, tempfile, tarfile
    
        # ensure targz exists
        tp = pathlib.Path(targz)
        if not tp.is_file():
            sys.stderr.write("Target '{}' does not exist!\n".format(tp) )
            return 1
    
        # src path(s)
        if not src:
            sys.stderr.write("No files given.\n")
            return 1
        # ensure iterable of string(s)
        if not isinstance(src, (tuple, list, set)):
            src = [src]
        # ensure path(s) exist
        srcp = []
        for s in src:
            sp = pathlib.Path(s)
            if not sp.is_file():
                sys.stderr.write("Source '{}' does not exist.\n".format(sp) )
            else:
                srcp.append(sp)
    
        if not srcp:
            sys.stderr.write("None of the files exist.\n")
            return 1
    
        # dst path(s) (filenames in archive)
        dstp = []
        if not dst:
            # default: use filename only
            dstp = [sp.name for sp in srcp]
        else:
            if callable(dst):
                # map dst to each Path, ensure results are Path
                dstp = [pathlib.Path(c) for c in map(dst, srcp)]
            elif not isinstance(dst, (tuple, list, set)):
                # ensure iterable of string(s)
                dstp = [pathlib.Path(dst).name]
            elif isinstance(dst, (tuple, list, set)):
                # convert each string to Path
                dstp = [pathlib.Path(d) for d in dst]
            else:
                # TODO directly support iterable of (src,dst) tuples
                sys.stderr.write("Please fix me, I cannot handle the destination(s) '{}'\n".format(dst) )
                return 1
    
        if not dstp:
            sys.stderr.write("None of the files exist.\n")
            return 1
    
        # combine src and dst paths
        sdp = zip(srcp, dstp) # iterator of tuples
    
        # temporary directory
        with tempfile.TemporaryDirectory() as tempdir:
            tempdirp = pathlib.Path(tempdir)
    
            # extract original archive to temporry directory
            with tarfile.open(tp) as r:
                r.extractall(tempdirp)
    
            # copy source(s) to target in temporary directory, optionally replacing it
            for s,d in sdp:
                dp = tempdirp/d
    
                # TODO extend to allow flag individually
                if not dp.is_file or replace:
                    sys.stderr.write("Writing '{1}' (from '{0}')\n".format(s,d) )
                    dp.write_bytes( s.read_bytes() )
                else:
                    sys.stderr.write("Skipping '{1}' (from '{0}')\n".format(s,d) )
    
            # replace original archive with new archive from all files in tempdir
            with tarfile.open(tp, "w:gz") as w:
                for f in tempdirp.iterdir():
                    w.add(f, arcname=f.name)
    
        return None
    

    And a few "tests" as example:

    # targz_add("test.tar.gz", "new.png", "a.png")
    # targz_add("test.tar.gz", "new.png", "a.png", replace=True)
    # targz_add("test.tar.gz", ["new.png"], "a.png")
    # targz_add("test.tar.gz", "new.png", ["a.png"], replace=True)
    targz_add("test.tar.gz", "new.png", lambda x:str(x).replace("new","a"), replace=True)
    

    shutil also supports archives, but not adding files to one:

    https://docs.python.org/3/library/shutil.html#archiving-operations

    New in version 3.2.
    Changed in version 3.5: Added support for the xztar format.
    High-level utilities to create and read compressed and archived files are also provided. They rely on the zipfile and tarfile modules.


    Here's adding a file by extracting to memory using io.BytesIO, adding, and compressing:

    import io
    import gzip
    import tarfile
    
    gfn = "test.tar.gz"
    replace = "a.png"
    replacement = "new.png"
    
    print("reading {}".format(gfn))
    m = io.BytesIO()
    with gzip.open(gfn) as g:
        m.write(g.read())
    
    print("opening tar in memory")
    m.seek(0)
    with tarfile.open(fileobj=m, mode="a") as t:
        t.list()
        print("adding {} as {}".format(replacement, replace))
        t.add(replacement, arcname=replace)
        t.list()
    
    print("writing {}".format(gfn))
    m.seek(0)
    with gzip.open(gfn, "wb") as g:
        g.write(m.read())
    

    it prints

    reading test.tar.gz
    opening tar in memory
    ?rw-rw-rw- 0/0        877 2018-04-11 07:38:57 a.png 
    ?rw-rw-rw- 0/0        827 2018-04-11 07:38:57 b.png 
    ?rw-rw-rw- 0/0        787 2018-04-11 07:38:57 c.png 
    adding new.png as a.png
    ?rw-rw-rw- 0/0        877 2018-04-11 07:38:57 a.png 
    ?rw-rw-rw- 0/0        827 2018-04-11 07:38:57 b.png 
    ?rw-rw-rw- 0/0        787 2018-04-11 07:38:57 c.png 
    -rw-rw-rw- 0/0       2108 2018-04-11 07:38:57 a.png 
    writing test.tar.gz
    

    Optimizations are welcome!

    0 讨论(0)
  • 2021-01-04 05:18

    From tarfile documentation:

    Note that 'a:gz' or 'a:bz2' is not possible. If mode is not suitable to open a certain (compressed) file for reading, ReadError is raised. Use mode 'r' to avoid this. If a compression method is not supported, CompressionError is raised.

    So I guess you should decompress it using gzip library, add the files using the a: mode in tarfile, and then compress again using gzip.

    0 讨论(0)
提交回复
热议问题