问题
File Structure
I have a folder, called test_folder, which has several subfolders (named different fruit names, as you'll see in my code below) within it. In each subfolder, there is always a metadump.xml file where I am extracting information from.
Current Stance
I have been able to achieve this on an individual basis, where I specify the subfolder path.
import re
in_file = open("C:/.../Downloads/test_folder/apple/metadump.xml")
contents = in_file.read()
in_file.close()
title = re.search('<dc:title rsfieldtitle="Title"
rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>',
contents).group(1)
print(title)
Next Steps
I would like to perform the following function on a larger scale by simply referencing the parent folder C:/.../Downloads/test_folder and making my program find the xml file for each subfolder to extract the desired information, rather than individually specifying every fruit subfolder.
Clarification
Rather than simply obtaining a list of subfolders or a list of xml files within these subfolders, I would like physically access these subfolders to perform this text extraction function from each xml file within each subfolder.
Thanks in advance for your help.
回答1:
You can use Python's os.walk()
to traverse all of the subfolders. If the file is metadump.xml
, it will open it and extract your title. The filename and the title is displayed:
import os
for root, dirs, files in os.walk(r"C:\...\Downloads\test_folder"):
for file in files:
if file == 'metadump.xml':
filename = os.path.join(root, file)
with open(filename) as f_xml:
contents = f_xml.read()
title = re.search('<dc:title rsfieldtitle="Title" rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>', contents).group(1)
print('{} : {}'.format(filename, title))
回答2:
you can use os.listdir as the following:
import os
parent_folder = 'C:/.../Downloads/test_folder'
subfolders = os.listdir(parent_folder)
for subfolder in subfolders:
in_file = open(parent_folder+'/'+ subfolder+'/metadump.xml')
contents = in_file.read()
in_file.close()
title = re.search('<dc:title rsfieldtitle="Title"
rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>',
contents).group(1)
print(title)
回答3:
You can do this by using glob module if you are not sure number of subfolders in your folder. recursive=True
will make it to check for all subfolders in your folder C:/../Downloads/test_folder/
and gives you list of all the metadump.xml
files
import re
import glob
for file in glob.glob("C:/**/Downloads/test_folder/**/metadump.xml", recursive=True):
with open(file) as in_file:
contents= in_file.read()
title = re.search('<dc:title rsfieldtitle="Title"
rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>',
contents).group(1)
print(title)
回答4:
This might help you:
import os
for root, dirs, files in os.walk("/mydir"):
for file in files:
if file.endswith(".xml"):
print(os.path.join(root, file))
来源:https://stackoverflow.com/questions/50721525/accessing-text-file-within-subfolder