问题
The code below takes an XML file and parses specific elements into a CSV file. Regarding the code I had simpler and different code that had a slightly different out, the code below is as an outcome of a lot help from here.
from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
tree = ET.parse('thexmlfile.xml')
root = tree.getroot()
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
start_nodes = root.findall('.//START')
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
for sn in start_nodes:
row = defaultdict(str)
for k,v in sn.attrib.items():
row[k] = v
for rn in sn.findall('.//Rational'):
row['rational'] = rn.text
for qu in sn.findall('.//Qualify'):
row['qualify'] = qu.text
for ds in sn.findall('.//Description'):
row['description_txt'] = ds.text
row['description_num'] = ds.attrib['num']
# all other tags except set data must be parsed before this.
for st in sn.findall('.//SetData'):
for k,v in st.attrib.items():
row['set_data_'+ str(k)] = v
row_data = [row[i] for i in headers]
writer.writerow(row_data)
row = defaultdict(str)
I'm trying to make that this code goes to a folder that has many XML files and parses them into one single CSV file. Simply said instead of parsing one XML file , do this for multiple XMLs and parse them to one csv file.
What I would normally do is use os.listdir(): . The code would look something like this
directory = 'C:/Users/docs/FolderwithXMLs'
for filename in os.listdir(directory):
if filename.endswith(".xml"):
#Something here
df.to_csv("./output.csv")
continue
else:
continue
I have tried different ways to implement this into the code from above without success until now. Considering that this process should also be fast.
回答1:
Try:
from pathlib import Path
directory = 'C:/Users/docs/FolderwithXMLs'
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
for xml_file in xml_files_list:
tree = ET.parse(xml_file)
root = tree.getroot()
start_nodes = root.findall('.//START')
for sn in start_nodes:
row = defaultdict(str)
# <<<<< Indentation was wrong here
for k,v in sn.attrib.items():
row[k] = v
# Rest of the code here.
Hope that helps.
来源:https://stackoverflow.com/questions/59554463/parse-many-xml-files-to-one-csv-file