Recursively compare two directories to ensure they have the same files and subdirectories

后端 未结 11 1251
猫巷女王i
猫巷女王i 2020-12-23 20:49

From what I observe filecmp.dircmp is recursive, but inadequate for my needs, at least in py2. I want to compare two directories and all their contained files. Do

相关标签:
11条回答
  • 2020-12-23 21:27
    def same(dir1, dir2):
    """Returns True if recursively identical, False otherwise
    
    """
        c = filecmp.dircmp(dir1, dir2)
        if c.left_only or c.right_only or c.diff_files or c.funny_files:
            return False
        else:
            safe_so_far = True
            for i in c.common_dirs:
                same_so_far = same_so_far and same(os.path.join(frompath, i), os.path.join(topath, i))
                if not same_so_far:
                    break
            return same_so_far
    
    0 讨论(0)
  • 2020-12-23 21:28

    Based on python issue 12932 and filecmp documentation you may use following example:

    import os
    import filecmp
    
    # force content compare instead of os.stat attributes only comparison
    filecmp.cmpfiles.__defaults__ = (False,)
    
    def _is_same_helper(dircmp):
        assert not dircmp.funny_files
        if dircmp.left_only or dircmp.right_only or dircmp.diff_files or dircmp.funny_files:
            return False
        for sub_dircmp in dircmp.subdirs.values():
           if not _is_same_helper(sub_dircmp):
               return False
        return True
    
    def is_same(dir1, dir2):
        """
        Recursively compare two directories
        :param dir1: path to first directory 
        :param dir2: path to second directory
        :return: True in case directories are the same, False otherwise
        """
        if not os.path.isdir(dir1) or not os.path.isdir(dir2):
            return False
        dircmp = filecmp.dircmp(dir1, dir2)
        return _is_same_helper(dircmp)
    
    0 讨论(0)
  • 2020-12-23 21:35

    The report_full_closure() method is recursive:

    comparison = filecmp.dircmp('/directory1', '/directory2')
    comparison.report_full_closure()
    

    Edit: After the OP's edit, I would say that it's best to just use the other functions in filecmp. I think os.walk is unnecessary; better to simply recurse through the lists produced by common_dirs, etc., although in some cases (large directory trees) this might risk a Max Recursion Depth error if implemented poorly.

    0 讨论(0)
  • 2020-12-23 21:39

    Since a True or False result is all you want, if you have diff installed:

    def are_dir_trees_equal(dir1, dir2):
        process = Popen(["diff", "-r", dir1, dir2], stdout=PIPE)
        exit_code = process.wait()
        return not exit_code
    
    0 讨论(0)
  • 2020-12-23 21:41

    dircmp can be recursive: see report_full_closure.

    As far as I know dircmp does not offer a directory comparison function. It would be very easy to write your own, though; use left_only and right_only on dircmp to check that the files in the directories are the same and then recurse on the subdirs attribute.

    0 讨论(0)
  • 2020-12-23 21:44

    filecmp.dircmp is the way to go. But it does not compare the content of files found with the same path in two compared directories. Instead filecmp.dircmp only looks at files attributes. Since dircmp is a class, you fix that with a dircmp subclass and override its phase3 function that compares files to ensure content is compared instead of only comparing os.stat attributes.

    import filecmp
    
    class dircmp(filecmp.dircmp):
        """
        Compare the content of dir1 and dir2. In contrast with filecmp.dircmp, this
        subclass compares the content of files with the same path.
        """
        def phase3(self):
            """
            Find out differences between common files.
            Ensure we are using content comparison with shallow=False.
            """
            fcomp = filecmp.cmpfiles(self.left, self.right, self.common_files,
                                     shallow=False)
            self.same_files, self.diff_files, self.funny_files = fcomp
    

    Then you can use this to return a boolean:

    import os.path
    
    def is_same(dir1, dir2):
        """
        Compare two directory trees content.
        Return False if they differ, True is they are the same.
        """
        compared = dircmp(dir1, dir2)
        if (compared.left_only or compared.right_only or compared.diff_files 
            or compared.funny_files):
            return False
        for subdir in compared.common_dirs:
            if not is_same(os.path.join(dir1, subdir), os.path.join(dir2, subdir)):
                return False
        return True
    

    In case you want to reuse this code snippet, it is hereby dedicated to the Public Domain or the Creative Commons CC0 at your choice (in addition to the default license CC-BY-SA provided by SO).

    0 讨论(0)
提交回复
热议问题