Python glob but against a list of strings rather than the filesystem

后端 未结 9 1921
被撕碎了的回忆
被撕碎了的回忆 2021-02-06 21:24

I want to be able to match a pattern in glob format to a list of strings, rather than to actual files in the filesystem. Is there any way to do this, or convert a glob

9条回答
  •  南笙
    南笙 (楼主)
    2021-02-06 21:31

    The glob module uses the fnmatch module for individual path elements.

    That means the path is split into the directory name and the filename, and if the directory name contains meta characters (contains any of the characters [, * or ?) then these are expanded recursively.

    If you have a list of strings that are simple filenames, then just using the fnmatch.filter() function is enough:

    import fnmatch
    
    matching = fnmatch.filter(filenames, pattern)
    

    but if they contain full paths, you need to do more work as the regular expression generated doesn't take path segments into account (wildcards don't exclude the separators nor are they adjusted for cross-platform path matching).

    You can construct a simple trie from the paths, then match your pattern against that:

    import fnmatch
    import glob
    import os.path
    from itertools import product
    
    
    # Cross-Python dictionary views on the keys 
    if hasattr(dict, 'viewkeys'):
        # Python 2
        def _viewkeys(d):
            return d.viewkeys()
    else:
        # Python 3
        def _viewkeys(d):
            return d.keys()
    
    
    def _in_trie(trie, path):
        """Determine if path is completely in trie"""
        current = trie
        for elem in path:
            try:
                current = current[elem]
            except KeyError:
                return False
        return None in current
    
    
    def find_matching_paths(paths, pattern):
        """Produce a list of paths that match the pattern.
    
        * paths is a list of strings representing filesystem paths
        * pattern is a glob pattern as supported by the fnmatch module
    
        """
        if os.altsep:  # normalise
            pattern = pattern.replace(os.altsep, os.sep)
        pattern = pattern.split(os.sep)
    
        # build a trie out of path elements; efficiently search on prefixes
        path_trie = {}
        for path in paths:
            if os.altsep:  # normalise
                path = path.replace(os.altsep, os.sep)
            _, path = os.path.splitdrive(path)
            elems = path.split(os.sep)
            current = path_trie
            for elem in elems:
                current = current.setdefault(elem, {})
            current.setdefault(None, None)  # sentinel
    
        matching = []
    
        current_level = [path_trie]
        for subpattern in pattern:
            if not glob.has_magic(subpattern):
                # plain element, element must be in the trie or there are
                # 0 matches
                if not any(subpattern in d for d in current_level):
                    return []
                matching.append([subpattern])
                current_level = [d[subpattern] for d in current_level if subpattern in d]
            else:
                # match all next levels in the trie that match the pattern
                matched_names = fnmatch.filter({k for d in current_level for k in d}, subpattern)
                if not matched_names:
                    # nothing found
                    return []
                matching.append(matched_names)
                current_level = [d[n] for d in current_level for n in _viewkeys(d) & set(matched_names)]
    
        return [os.sep.join(p) for p in product(*matching)
                if _in_trie(path_trie, p)]
    

    This mouthful can quickly find matches using globs anywhere along the path:

    >>> paths = ['/foo/bar/baz', '/spam/eggs/baz', '/foo/bar/bar']
    >>> find_matching_paths(paths, '/foo/bar/*')
    ['/foo/bar/baz', '/foo/bar/bar']
    >>> find_matching_paths(paths, '/*/bar/b*')
    ['/foo/bar/baz', '/foo/bar/bar']
    >>> find_matching_paths(paths, '/*/[be]*/b*')
    ['/foo/bar/baz', '/foo/bar/bar', '/spam/eggs/baz']
    

提交回复
热议问题