filecmp.cmp() ignoring differing os.stat() signatures?

跟風遠走 提交于 2019-12-06 03:40:04

问题


The Python 2 docs for filecmp() say:

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

Which sounds like two files which are identical except for their os.stat() signature will be considered unequal, however this does not seem to be the case, as illustrated by running the following code snippet:

import filecmp
import os
import shutil
import time

with open('test_file_1', 'w') as f:
    f.write('file contents')
shutil.copy('test_file_1', 'test_file_2')
time.sleep(5)  # pause to get a different time-stamp
os.utime('test_file_2', None)  # change copied file's time-stamp

print 'test_file_1:', os.stat('test_file_1')
print 'test_file_2:', os.stat('test_file_2')
print 'filecmp.cmp():', filecmp.cmp('test_file_1', 'test_file_2')

Output:

test_file_1: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0,
  st_uid=0, st_gid=0, st_size=13L, st_atime=1320719522L, st_mtime=1320720444L, 
  st_ctime=1320719522L)
test_file_2: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0, 
  st_uid=0, st_gid=0, st_size=13L, st_atime=1320720504L, st_mtime=1320720504L, 
  st_ctime=1320719539L)
filecmp.cmp(): True

As you can see the two files' time stamps — st_atime, st_mtime, and st_ctime— are clearly not the same, yet filecmp.cmp() indicates that the two are identical. Am I misunderstanding something or is there a bug in either filecmp.cmp()'s implementation or its documentation?

Update

The Python 3 documentation has been rephrased and currently says the following, which IMHO is an improvement only in the sense that it better implies that files with different time stamps might still be considered equal even when shallow is True.

If shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared.

FWIW I think it would have been better to simply have said something like this:

If shallow is true, file content is compared only when os.stat() signatures are unequal.


回答1:


You're misunderstanding the documentation. Line #2 says:

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

Files with identical os.stat() signatures are taken to be equal, but the logical inverse is not true: files with unequal os.stat() signatures are not necessarily taken to be unequal. Rather, they may be unequal, in which case the actual file contents are compared. Since the file contents are found to be identical, filecmp.cmp() returns True.

As per the third clause, once it determines that the files are equal, it will cache that result and not bother re-reading the file contents if you ask it to compare the same files again, so long as those files' os.stat structures don't change.




回答2:


It seems that 'rolling your own' is indeed what is required to produce a desirable result. It would simply be nice if the documentation were clear enough to make a casual reader reach that conclusion.

Here's the function I am presently using:

def cmp_stat_weak(a, b):
    sa = os.stat(a)
    sb = os.stat(b)
    return (sa.st_size == sb.st_size and sa.st_mtime == sb.st_mtime)


来源:https://stackoverflow.com/questions/8045564/filecmp-cmp-ignoring-differing-os-stat-signatures

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!