diff files inside of zip without extracting it [closed]

寵の児 提交于 2021-02-07 04:57:14

问题


Is there any way to perform diff operetion on two files in two zips without extracting them? If not - any other workaround to compare them without extracting?

Thanks.


回答1:


Combining the responses so far, the following bash function will compare the file listings from the zip files. The listings include verbose output (unzip -v), so checksums can be compared. Output is sorted by filename (sort -k8) to allow side by side comparison and the diff output expanded (W200) so the filenames are visible int he side by side view.

function zipdiff() { diff -W200 -y <(unzip -vql $1 | sort -k8) <(unzip -vql $2 | sort -k8); }

This can be added to your ~/.bashrc file to be used from any console. It can be used with zipdiff a.zip b.zip. Piping the output to less or redirecting to a file is helpful for large zip files.




回答2:


unzip -l will list the contents of a zip file. You can then pass that to diff in the normal manner as mentioned here: https://askubuntu.com/questions/229447/how-do-i-diff-the-output-of-two-commands

So for example if you had two zip files:

foo.zip
bar.zip

You could run diff -y <(unzip -l foo.zip) <(unzip -l bar.zip) to do a side-by-side diff of the contents of the two files.

Hope that helps!




回答3:


If you want to diff two files (as in see the difference) you have to extract them - even if only to memory!

In order to see the diff of two files in two zips you can do something like this (no error checking or whatsoever):

# define a little bash function
function zipdiff () { diff -u <(unzip -p $1 $2) <(unzip -p $3 $4); }

# test it: create a.zip and b.zip, each with a different file.txt
echo hello >file.txt; zip a.zip file.txt
echo world >file.txt; zip b.zip file.txt

zipdiff a.zip file.txt b.zip file.txt
--- /dev/fd/63  2016-02-23 18:18:09.000000000 +0100
+++ /dev/fd/62  2016-02-23 18:18:09.000000000 +0100
@@ -1 +1 @@
-hello
+world

Note: unzip -p extracts files to pipe (stdout).

If you only want to know if the files are different you can inspect their checksums using

unzip -v -l zipfile [file_to_inspect]

Note: -v means verbose and -llist contents)

unzip -v -l a.zip 
Archive:  a.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       6  Stored        6   0% 2016-02-23 18:23 363a3020  file.txt
--------          -------  ---                            -------
       6                6   0%                            1 file

unzip -v -l b.zip 
Archive:  b.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       6  Stored        6   0% 2016-02-23 18:23 dd3861a8  file.txt
--------          -------  ---                            -------
       6                6   0%                            1 file 

In the example above you can see that the checksums (CRC-32) are different.

You might also be interested in this project: https://github.com/nhnb/zipdiff




回答4:


I wanted the actual diff between the files in the zips in a readable format. Here is a bash function that I wrote for this purpose which makes use of git. This has a good UX if you already use git as part of your normal workflow and can read git diffs.

# usage: zipdiff before.zip after.zip
function zipdiff {
  current=$(pwd)
  before="$current/$1"
  after="$current/$2"
  tempdir=$(mktemp -d)
  cd "$tempdir"
  git init &> /dev/null
  unzip -qq "$before" *
  git add . &> /dev/null
  git commit -m "before" &> /dev/null
  rm -rf "$tempdir/*"  
  yes | unzip -qq "$after" * &> /dev/null
  git add .
  git diff --cached
  cd "$current"
  rm -rf "$tempdir"
}




回答5:


Compressed File Contents Only

I was looking for a way to compare the contents of the files stored in the zipfile, but not other metadata. Consider the following:

$ echo foo > foo.txt
$ zip now.zip foo.txt
  adding: foo.txt (stored 0%)
$ zip later.zip foo.txt
  adding: foo.txt (stored 0%)
$ diff now.zip later.zip 
Binary files now.zip and later.zip differ

Conceptually, this makes no sense; I ran the same command on the same inputs and got 2 different outputs! The difference is the metadata, which stores the date the file was added!

$ unzip -v now.zip 
Archive:  now.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       4  Stored        4   0% 04-08-2020 23:27 7e3265a8  foo.txt
--------          -------  ---                            -------
       4                4   0%                            1 file
$ unzip -v later.zip
Archive:  later.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       4  Stored        4   0% 04-08-2020 23:28 7e3265a8  foo.txt
--------          -------  ---                            -------
       4                4   0%                            1 file

Note: I manually edited the time of the second file here from 23:27 to 23:28 for clarity. The field in the file itself stores the value of seconds (which, in my case, differed -- a binary diff would still fail) even though they are not represented in the command's output.

So to diff the files only, we must ignore the date fields. unzip -vqq will get us a better summary:

$ unzip -vqq now.zip
       4  Stored        4   0% 04-08-2020 23:27 7e3265a8  foo.txt

So let's mask out the fields (we don't care about dates or compression metrics) and sort the files:

$ unzip -vqq now.zip  | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3
4      7e3265a8 foo.txt

TL;DR

The command to diff 2 zipfiles (a.zip and b.zip) is

diff \
  <(unzip -vqq a.zip  | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3) \
  <(unzip -vqq b.zip  | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3)



回答6:


By postprocessing the output of zipcmp, you can recurse through the archives to obtain a more detailed summary of the differences between them.

#!/bin/bash

# process zipcmp's output to do true diffs of archive contents
# 1. grep removes the '+++' and '---' from zipcmp's output
# 2. awk prints the final column of output
# 3. sort | uniq to dedupe
for badfile in $(zipcmp ${1?No first zip} ${2?No second zip} \
    | grep -Ev '^[+-]{3}' \
    | awk '{print $NF}' \
    | sort | uniq);
do
    echo "diffing $badfile"
    diff <(unzip -p $1 $badfile) <(unzip -p $2 $badfile) ;
done;




回答7:


If you need just to check if files are equal you can compare CRC32 checksums, which are stored in archive local header fields/central directory.




回答8:


The comp_zip tool in the open-source library Zip-Ada (available here or here) performs a comparison without extraction: contents, files of a.zip missing in b.zip and integrity check of both.




回答9:


Web-tools such as https://www.diffnow.com/compare-files offer a quite nice visual information which files in the zip have changed:

This works very convenient for not too big zip-files without the need to install anything. This works not only for Linux but also for other operating systems including Windows and Mac.

The tools discussed in the other answers offer obviously more advanced options and can be faster for larger zip files.




回答10:


Some command line tools exists:

  1. diffzips.pl written in Perl.
  2. zipdiff written in Java.
  3. zipdiff port to .NET of the previous one.
  4. zipcmp written in C, from libzip library
  5. zcmp and zdiff from gzip, can be used on zip files.

I am an happy user of diffzips.pl to compare the content of epub files. diffzips.pl has also the advantage to be recursive, comparing zip file inside the parent zip.



来源:https://stackoverflow.com/questions/35581274/diff-files-inside-of-zip-without-extracting-it

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!