tar package has different checksum for exactly the same content

我怕爱的太早我们不能终老 提交于 2020-08-09 13:00:13

问题


Packaging a folder on a SUSE Linux Enterprise Server 12 SP3 system using GNU tar 1.30 always gives different md5 checksums although the file contents do not change.

I run tar to package my folder that contains a simple text file:

tar cf package.tar folder

Nevertheless, although the content is exactly the same, the resulting tar always has a different md5 (or sha1) checksum:

$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
e6383218596fffe118758b46e0edad1d  package.tar
$> rm -rf package.tar && tar cf package.tar folder && md5sum package.tar
1c5aa972e5bfa2ec78e63a9b3116e027  package.tar

Because the linux file system seems to deliver files in a random order to tar, I tried using the --sort option. But the resulting command doesn't change the checksum issue for me. Also tar's --mtime option does not help here, since the creation dates are exactly the same.

I appreciate any help on this.


回答1:


The archives you provided contain pax extended headers. A quick glance at their structure reveals that they differ in these two fields:

  1. The process ID of the pax process (as part of a name for the extended header in the ustar header block, and consequently the checksum for this ustar header block).
  2. The atime (access time) in the extended header.

One of the workarounds you can use for reproducible archive creation is to enforce the old unix ustar format (rather than the pax/posix format):

tar --format=ustar -cf package.tar folder

The other choice is to manually set the extended name and delete the atime while preserving the pax format:

tar --format=pax --pax-option=exthdr.name=%d/PaxHeaders/%f,delete=atime -cf package.tar folder

Now the md5sum should be the same for both archives.




回答2:


The header for tar files contain several fields which will be potentially different each time you re-tar a set of files. For instance the last access time and modification time will likely be different each time.

According to this article it is possible with GNU tar to produce identical output for identical input by doing the following:

# requires GNU Tar 1.28+
$ tar --sort=name \
      --mtime="2018-10-05 00:00Z" \
      --owner=0 --group=0 --numeric-owner \
      -cf product.tar build


来源:https://stackoverflow.com/questions/52668432/tar-package-has-different-checksum-for-exactly-the-same-content

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!