Parse many XML files to one CSV file

谁说胖子不能爱 提交于 2020-01-15 09:53:09

问题


The code below takes an XML file and parses specific elements into a CSV file. Regarding the code I had simpler and different code that had a slightly different out, the code below is as an outcome of a lot help from here.

from xml.etree import ElementTree as ET
from collections import defaultdict
import csv

tree = ET.parse('thexmlfile.xml')
root = tree.getroot()

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)

    start_nodes = root.findall('.//START')
    headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
    writer.writerow(headers)
    for sn in start_nodes:
        row = defaultdict(str)

        for k,v in sn.attrib.items():
            row[k] = v

        for rn in sn.findall('.//Rational'):
            row['rational'] = rn.text

        for qu in sn.findall('.//Qualify'):
            row['qualify'] = qu.text

        for ds in sn.findall('.//Description'):
            row['description_txt'] = ds.text
            row['description_num'] = ds.attrib['num']

        # all other tags except set data must be parsed before this.
        for st in sn.findall('.//SetData'):
            for k,v in st.attrib.items():
                row['set_data_'+ str(k)] = v
            row_data = [row[i] for i in headers]
            writer.writerow(row_data)
            row = defaultdict(str)

I'm trying to make that this code goes to a folder that has many XML files and parses them into one single CSV file. Simply said instead of parsing one XML file , do this for multiple XMLs and parse them to one csv file.

What I would normally do is use os.listdir(): . The code would look something like this

directory = 'C:/Users/docs/FolderwithXMLs'
for filename in os.listdir(directory):
    if filename.endswith(".xml"):
        #Something here
        df.to_csv("./output.csv")
        continue
    else:
        continue

I have tried different ways to implement this into the code from above without success until now. Considering that this process should also be fast.


回答1:


Try:


from pathlib import Path

directory = 'C:/Users/docs/FolderwithXMLs'

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)

    headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']

    writer.writerow(headers)

    xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
    for xml_file in xml_files_list:
        tree = ET.parse(xml_file)
        root = tree.getroot()

        start_nodes = root.findall('.//START')
        for sn in start_nodes:
            row = defaultdict(str)

            # <<<<< Indentation was wrong here
            for k,v in sn.attrib.items():
                row[k] = v

            # Rest of the code here.

Hope that helps.



来源:https://stackoverflow.com/questions/59554463/parse-many-xml-files-to-one-csv-file

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!