问题
I am working on a script to recursively go through subfolders in a mainfolder and build a list off a certain file type. I am having an issue with the script. Its currently set as follows
for root, subFolder, files in os.walk(PATH):
for item in files:
if item.endswith(\".txt\") :
fileNamePath = str(os.path.join(root,subFolder,item))
the problem is that the subFolder variable is pulling in a list of subfolders rather than the folder that the ITEM file is located. I was thinking of running a for loop for the subfolder before and join the first part of the path but I figured Id double check to see if anyone has any suggestions before that. Thanks for your help!
回答1:
You should be using the dirpath which you call root. The dirnames are supplied so you can prune it if there are folders that you don't wish os.walk to recurse into.
import os
result = [os.path.join(dp, f) for dp, dn, filenames in os.walk(PATH) for f in filenames if os.path.splitext(f)[1] == '.txt']
Edit:
After the latest downvote, it occurred to me that glob is a better tool for selecting by extension.
import os
from glob import glob
result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]
Also a generator version
from itertools import chain
result = (chain.from_iterable(glob(os.path.join(x[0], '*.txt')) for x in os.walk('.')))
Edit2 for Python 3.4+
from pathlib import Path
result = list(Path(".").rglob("*.[tT][xX][tT]"))
回答2:
Changed in Python 3.5: Support for recursive globs using “**”.
glob.glob() got a new recursive parameter.
If you want to get every .txt file under my_path (recursively including subdirs):
import glob
files = glob.glob(my_path + '/**/*.txt', recursive=True)
# my_path/ the dir
# **/ every file and dir under my_path
# *.txt every file that ends with '.txt'
If you need an iterator you can use iglob as an alternative:
for file in glob.iglob(my_path, recursive=False):
# ...
回答3:
I will translate John La Rooy's list comprehension to nested for's, just in case anyone else has trouble understanding it.
result = [y for x in os.walk(PATH) for y in glob(os.path.join(x[0], '*.txt'))]
Should be equivalent to:
import glob
result = []
for x in os.walk(PATH):
for y in glob.glob(os.path.join(x[0], '*.txt')):
result.append(y)
Here's the documentation for list comprehension and the functions os.walk and glob.glob.
回答4:
Its not the most pythonic answer, but I'll put it here for fun because it's a neat lesson in recursion
def find_files( files, dirs=[], extensions=[]):
new_dirs = []
for d in dirs:
try:
new_dirs += [ os.path.join(d, f) for f in os.listdir(d) ]
except OSError:
if os.path.splitext(d)[1] in extensions:
files.append(d)
if new_dirs:
find_files(files, new_dirs, extensions )
else:
return
On my machine I have two folders, root and root2
mender@multivax ]ls -R root root2
root:
temp1 temp2
root/temp1:
temp1.1 temp1.2
root/temp1/temp1.1:
f1.mid
root/temp1/temp1.2:
f.mi f.mid
root/temp2:
tmp.mid
root2:
dummie.txt temp3
root2/temp3:
song.mid
Lets say I want to find all .txt and all .mid files in either of these directories, then I can just do
files = []
find_files( files, dirs=['root','root2'], extensions=['.mid','.txt'] )
print(files)
#['root2/dummie.txt',
# 'root/temp2/tmp.mid',
# 'root2/temp3/song.mid',
# 'root/temp1/temp1.1/f1.mid',
# 'root/temp1/temp1.2/f.mid']
回答5:
The new pathlib library simplifies this to one line:
from pathlib import Path
result = list(Path(PATH).glob('**/*.txt'))
You can also use the generator version:
from pathlib import Path
for file in Path(PATH).glob('**/*.txt'):
pass
This returns Path objects, which you can use for pretty much anything, or get the file name as a string by file.name.
回答6:
Recursive is new in Python 3.5, so it won't work on Python 2.7. Here is the example that uses r strings so you just need to provide the path as is on either Win, Lin, ...
import glob
mypath=r"C:\Users\dj\Desktop\nba"
files = glob.glob(mypath + r'\**\*.py', recursive=True)
# print(files) # as list
for f in files:
print(f) # nice looking single line per file
Note: It will list all files, no matter how deep it should go.
回答7:
This function will recursively put only files into a list. Hope this will you.
import os
def ls_files(dir):
files = list()
for item in os.listdir(dir):
abspath = os.path.join(dir, item)
try:
if os.path.isdir(abspath):
files = files + ls_files(abspath)
else:
files.append(abspath)
except FileNotFoundError as err:
print('invalid directory\n', 'Error: ', err)
return files
回答8:
You can do it this way to return you a list of absolute path files.
def list_files_recursive(path):
"""
Function that receives as a parameter a directory path
:return list_: File List and Its Absolute Paths
"""
import os
files = []
# r = root, d = directories, f = files
for r, d, f in os.walk(path):
for file in f:
files.append(os.path.join(r, file))
lst = [file for file in files]
return lst
if __name__ == '__main__':
result = list_files_recursive('/tmp')
print(result)
来源:https://stackoverflow.com/questions/18394147/recursive-sub-folder-search-and-return-files-in-a-list-python