Python: namespaces in xml ElementTree (or lxml)

删除回忆录丶 提交于 2019-12-01 16:35:20

问题


I want to retrieve a legacy xml file, manipulate and save it.

Here is my code:

from xml.etree import cElementTree as ET
NS = "{http://www.somedomain.com/XI/Traffic/10}"

def fix_xml(filename):
    f = ET.parse(filename)
    root = f.getroot()
    eventlist = root.findall("%(ns)Event" % {'ns':NS })
    xpath = "%(ns)sEventDetail/%(ns)sEventDescription" % {'ns':NS }
    for event in eventlist:
        desc = event.find(xpath)
        desc.text = desc.text.upper() # do some editting to the text.

    ET.ElementTree(root, nsmap=NS).write("out.xml", encoding="utf-8")


shorten_xml("test.xml")

The file I load contains:

xmlns="http://www.somedomain.com/XI/Traffic/10"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.somedomain.com/XI/Traffic/10 10.xds"

at the root tag.

I have the following problems, related to namespace:

  • As you see, for each tag call, I have give the namespace at the begining to retreive a child.
  • Generated xml file doesn't have <?xml version="1.0" encoding="utf-8"?> at the begining.
  • The tags at the output contains such <ns0:eventDescription> while I need output as the original <eventDescription>, without namespace at the begining.

How can these be solved?


回答1:


Have a look at the lxml tutorial section on namespaces. Also this article about namespaces in ElementTree.

Problem 1: Put up with it, like everybody else does. Instead of "%(ns)Event" % {'ns':NS } try NS+"Event".

Problem 2: By default, the XML declaration is written only if it is required. You can force it (lxml only) by using xml_declaration=True in your write() call.

Problem 3: The nsmap arg appears to be lxml-only. AFAICT it needs a MAPping, not a string. Try nsmap={None: NS}. The effbot article has a section describing a workaround for this.




回答2:


To answer your questions in order:

  • you can't just ignore the namespace, not in the path syntax that .findall() uses , but not in "real" xpath (supported by lxml) either: there you'd still be forced to use a prefix, and still need to provide some prefix-to-uri mapping.

  • use xml_declaration=True as well as encoding='utf-8' with the .write() call (available in lxml, but in stdlib xml.etree only since python 2.7 I believe)

  • I believe lxml will do behave like you want



来源:https://stackoverflow.com/questions/4886189/python-namespaces-in-xml-elementtree-or-lxml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!