How to traverse through the files in a directory?

前端 未结 9 1748
离开以前
离开以前 2020-11-30 11:34

I have a directory logfiles. I want to process each file inside this directory using a Python script.

for file in directory:
      # do something
         


        
9条回答
  •  爱一瞬间的悲伤
    2020-11-30 11:49

    This is an update of my last version that accepts glob style wildcards in exclude lists. The function basically walks into every subdirectory of the given path and returns the list of all files from those directories, as relative paths. Function works like Matheus' answer, and may use optional exclude lists.

    Eg:

    files = get_files_recursive('/some/path')
    files = get_files_recursive('/some/path', f_exclude_list=['.cache', '*.bak'])
    files = get_files_recursive('C:\\Users', d_exclude_list=['AppData', 'Temp'])
    files = get_files_recursive('/some/path', ext_exclude_list=['.log', '.db'])
    

    Hope this helps someone like the initial answer of this thread helped me :)

    import os
    from fnmatch import fnmatch
    
    def glob_path_match(path, pattern_list):
        """
        Checks if path is in a list of glob style wildcard paths
        :param path: path of file / directory
        :param pattern_list: list of wildcard patterns to check for
        :return: Boolean
        """
        return any(fnmatch(path, pattern) for pattern in pattern_list)
    
    
    def get_files_recursive(root, d_exclude_list=None, f_exclude_list=None, ext_exclude_list=None, primary_root=None):
        """
        Walk a path to recursively find files
        Modified version of https://stackoverflow.com/a/24771959/2635443 that includes exclusion lists
        and accepts glob style wildcards on files and directories
        :param root: path to explore
        :param d_exclude_list: list of root relative directories paths to exclude
        :param f_exclude_list: list of filenames without paths to exclude
        :param ext_exclude_list: list of file extensions to exclude, ex: ['.log', '.bak']
        :param primary_root: Only used for internal recursive exclusion lookup, don't pass an argument here
        :return: list of files found in path
        """
    
        if d_exclude_list is not None:
            # Make sure we use a valid os separator for exclusion lists, this is done recursively :(
            d_exclude_list = [os.path.normpath(d) for d in d_exclude_list]
        else:
            d_exclude_list = []
        if f_exclude_list is None:
            f_exclude_list = []
        if ext_exclude_list is None:
            ext_exclude_list = []
    
        files = [os.path.join(root, f) for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))
                 and not glob_path_match(f, f_exclude_list) and os.path.splitext(f)[1] not in ext_exclude_list]
        dirs = [d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d))]
        for d in dirs:
            p_root = os.path.join(primary_root, d) if primary_root is not None else d
            if not glob_path_match(p_root, d_exclude_list):
                files_in_d = get_files_recursive(os.path.join(root, d), d_exclude_list, f_exclude_list, ext_exclude_list,
                                                 primary_root=p_root)
                if files_in_d:
                    for f in files_in_d:
                        files.append(os.path.join(root, f))
        return files
    

提交回复
热议问题