Is there a safe way to run a diff on two zip compressed files?

岁酱吖の 提交于 2019-12-18 03:54:25

问题


Seems this would not be a deterministic thing, or is there a way to do this reliably?


回答1:


If you're using gzip, you can do something like this:

# diff <(zcat file1.gz) <(zcat file2.gz)



回答2:


Reliable: unzip both, diff.

I have no idea if that answer's good enough for your use, but it works.




回答3:


In general, you cannot avoid decompressing and then comparing. Different compressors will result in different DEFLATEd byte streams, which when INFLATEd result in the same original text. You cannot simply compare the DEFLATEd data, one to another. That will FAIL in some cases.

But in a ZIP scenario, there is a CRC32 calculated and stored for each entry. So if you want to check files, you can simply compare the stored CRC32 associated to each DEFLATEd stream, with the caveats on the uniqueness properties of the CRC32 hash. It may fit your needs to compare the FileName and the CRC.

You would need a ZIP library that reads zip files and exposes those things as properties on the "ZipEntry" object. DotNetZip will do that for .NET apps.




回答4:


zipcmp compares the zip archives zip1 and zip2 and checks if they contain the same files, comparing their names, uncompressed sizes, and CRCs. File order and compressed size differences are ignored.

sudo apt-get install zipcmp




回答5:


This isn't particularly elegant, but you can use the FileMerge application that comes with Mac OS X developer tools to compare the contents of zip files using a custom filter.

Create a script ~/bin/zip_filemerge_filter.bash with contents:

#!/bin/bash
##
#  List the size, CR-32 checksum, and file path of each file in a zip archive,
#  sorted in order by file path.
##
unzip -v -l "${1}" | cut -c 1-9,59-,49-57 | sort -k3
exit $?

Make the script executable (chmod +x ~/bin/zip_filemerge_filter.bash).

Open FileMerge, open the Preferences, and go to the "Filters" tab. Add an item to the list with: Extension:"zip", Filter:"~/bin/zip_filemerge_filter.bash $(FILE)", Display: Filtered, Apply*: No. (I've also added the filer for .jar and .war files.)

Then use FileMerge (or the command line "opendiff" wrapper) to compare two .zip files.

This won't let you diff the contents of files within the zip archives, but will let you quickly see which files appear within one only archive and which files exist in both but have different content (i.e. different size and/or checksum).




回答6:


Beyond compare has no problem with this.




回答7:


Actually gzip and bzip2 both come with dedicated tools for doing that.

With gzip:

$ zdiff file1.gz file2.gz

With bzip2:

$ bzdiff file1.bz2 file2.bz2

But keep in mind that for very large files, you might run into memory issues (I originally came here to find out about how to solve them, so I don't have the answer yet).




回答8:


A python solution for zip files:

import difflib
import zipfile

def diff(filename1, filename2):
    differs = False

    z1 = zipfile.ZipFile(open(filename1))
    z2 = zipfile.ZipFile(open(filename2))
    if len(z1.infolist()) != len(z2.infolist()):
        print "number of archive elements differ: {} in {} vs {} in {}".format(
            len(z1.infolist()), z1.filename, len(z2.infolist()), z2.filename)
        return 1
    for zipentry in z1.infolist():
        if zipentry.filename not in z2.namelist():
            print "no file named {} found in {}".format(zipentry.filename,
                                                        z2.filename)
            differs = True
        else:
            diff = difflib.ndiff(z1.open(zipentry.filename),
                                 z2.open(zipentry.filename))
            delta = ''.join(x[2:] for x in diff
                            if x.startswith('- ') or x.startswith('+ '))
            if delta:
                differs = True
                print "content for {} differs:\n{}".format(
                    zipentry.filename, delta)
    if not differs:
        print "all files are the same"
        return 0
    return 1

Use as

diff(filename1, filename2)

It compares files line-by-line in memory and shows changes.




回答9:


WinMerge (windows only) has lots of features and one of them is:

  • Archive file support using 7-Zip



回答10:


I found relief with this simple Perl script: diffzips.pl

It recursively diffs every zip file inside the original zip, which is especially useful for different Java package formats: jar, war, and ear.

zipcmp uses more simple approach and it doesn't recurse into archived zips.




回答11:


I generally use an approach like @mrabbit's but run 2 unzip commands and diff the output as required. For example I need to compare 2 Java WAR files.

$ sdiff --width 160 \
   <(unzip -l -v my_num1.war | cut -c 1-9,59-,49-57 | sort -k3) \
   <(unzip -l -v my_num2.war | cut -c 1-9,59-,49-57 | sort -k3)

Resulting in output like so:

--------          -------                                                       --------          -------
Archive:                                                                        Archive:
-------- -------- ----                                                          -------- -------- ----
48619281          130 files                                                   | 51043693          130 files
    1116 060ccc56 index.jsp                                                         1116 060ccc56 index.jsp
       0 00000000 META-INF/                                                            0 00000000 META-INF/
     155 b50f41aa META-INF/MANIFEST.MF                                        |      155 701f1623 META-INF/MANIFEST.MF
 Length   CRC-32  Name                                                           Length   CRC-32  Name
    1179 b42096f1 version.jsp                                                       1179 b42096f1 version.jsp
       0 00000000 WEB-INF/                                                             0 00000000 WEB-INF/
       0 00000000 WEB-INF/classes/                                                     0 00000000 WEB-INF/classes/
       0 00000000 WEB-INF/classes/com/                                                 0 00000000 WEB-INF/classes/com/
...
...



回答12:


I gave up trying to use existing tools and wrote a little bash script that works for me:

#!/bin/bash
# Author: Onno Benschop, onno@itmaze.com.au
# Note: This requires enough space for both archives to be extracted in the tempdir

if [ $# -ne 2 ] ; then
  echo Usage: $(basename "$0") zip1 zip2
  exit
fi

# Make temporary directories
archive_1=$(mktemp -d)
archive_2=$(mktemp -d)

# Unzip the archives
unzip -qqd"${archive_1}" "$1"
unzip -qqd"${archive_2}" "$2"

# Compare them
diff -r "${archive_1}" "${archive_2}"

# Remove the temporary directories
rm -rf "${archive_1}" "${archive_2}"


来源:https://stackoverflow.com/questions/587442/is-there-a-safe-way-to-run-a-diff-on-two-zip-compressed-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!