Python read xml with related child elements

主宰稳场 提交于 2019-12-06 18:12:26

Here's an example that gets the data from xml.

code.py:

#!/usr/bin/env python3

import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp


file_name = "a.xml"


def get_product_sn(product_node):
    for product_node_child in list(product_node):
        if product_node_child.tag == "serialNumber":
            return product_node_child.attrib.get("value", None)
    return None


def get_parts_data(parts_node):
    ret = list()
    for parts_node_child in list(parts_node):
        attrs = parts_node_child.attrib
        ret.append({"number": attrs.get("number", None), "name": attrs.get("name", None), "index": attrs.get("index", None)})
    return ret


def get_visit_node_data(visit_node):
    ret = dict()
    for visit_node_child in list(visit_node):
        if visit_node_child.tag == "general":
            for general_node_child in list(visit_node_child):
                if general_node_child.tag == "startDateTime":
                    ret["startDateTime"] = general_node_child.text
                elif general_node_child.tag == "endDateTime":
                    ret["endDateTime"] = general_node_child.text
        elif visit_node_child.tag == "parts":
            ret["parts"] = get_parts_data(visit_node_child)
    return ret


def get_node_data(node):
    ret = {"visits": list()}
    for node_child in list(node):
        if node_child.tag == "product":
            ret["serialNumber"] = get_product_sn(node_child)
        elif node_child.tag == "visits":
            for visits_node_child in list(node_child):
                ret["visits"].append(get_visit_node_data(visits_node_child))
    return ret


def main():
    tree = ET.parse(file_name)
    root_node = tree.getroot()
    data = get_node_data(root_node)
    pp(data)


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

Notes:

  • It treats the xml in a tree-like manner, so it maps (if you will) on the xml (if the xml structure changes, the code should be adapted as well)
  • It's designed to be general: get_node_data could be called on a node that has 2 children: product and visits. In our case it's the root node itself, but in the real world there could be a sequence of such nodes each with the 2 children that I listed above
  • It's designed to be error-friendly so if the xml is incomplete, it will get as much data as it can; I chose this (greedy) approach over the one that when it encounters an error it simply throws an exception
  • As I didn't work with pandas, instead of populating the object I simply return a Python dictionary (json); I think converting it to a DataFrame shouldn't be hard
  • I've run it with Python 2.7 and Python 3.5

The output (a dictionary containing 2 keys) - indented for readability:

  • serialNumber - the serial number (obviously)
  • visits (since it's a dictionary, I had to place this data "under" a key) - a list of dictionaries each containing data from a visit node

Output:

(py_064_03.05.04_test0) e:\Work\Dev\StackOverflow\q045049761>"e:\Work\Dev\VEnvs\py_064_03.05.04_test0\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32

{'serialNumber': '764000606',
 'visits': [{'endDateTime': '2014-03-11T13:51:31.480Z',
             'parts': [{'index': '0016', 'name': 'WSSA', 'number': '03081'}],
             'startDateTime': '2014-01-10T12:22:39.166Z'},
            {'endDateTime': '2013-03-11T13:51:31.480Z',
             'parts': [{'index': '0017', 'name': 'PSSF', 'number': '02081'}],
             'startDateTime': '2013-01-10T12:22:39.166Z'}]}


@EDIT0: added multiple part node handling as requested in one of the comments. That functionality has been moved to get_parts_data. Now, each entry in the visits list will have a parts key whose value will be a list consisting of dictionaries extracted from each part node (not the case for the provided xml).

try the following,

import xml.dom.minidom as minidom
doc = minidom.parse('filename')
memoryElem = doc.getElementsByTagName('part')[0]

print memoryElem.getAttribute('number')
print memoryElem.getAttribute('name')
print memoryElem.getAttribute('index')

Hope it will help u.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!