Alter namespace prefixing with ElementTree in Python

▼魔方 西西 提交于 2019-11-29 11:19:46

问题


By default, when you call ElementTree.parse(someXMLfile) the Python ElementTree library prefixes every parsed node with it's namespace URI in Clark's Notation:

    {http://example.org/namespace/spec}mynode

This makes accessing specific nodes by name a huge pain later in the code.

I've read through the docs on ElementTree and namespaces and it looks like the iterparse() function should allow me to alter the way the parser prefixes namespaces, but for the life of me I can't actually make it change the prefix. It seems like that may happen in the background before the ns-start event even fires as in this example:

for event, elem in iterparse(source):
    if event == "start-ns":
        namespaces.append(elem)
    elif event == "end-ns":
        namespaces.pop()
    else:
        ...

How do I make it change the prefixing behavior and what is the proper thing to return when the function ends?


回答1:


You don't specifically need to use iterparse. Instead, the following script:

from cStringIO import StringIO
import xml.etree.ElementTree as ET

NS_MAP = {
    'http://www.red-dove.com/ns/abc' : 'rdc',
    'http://www.adobe.com/2006/mxml' : 'mx',
    'http://www.red-dove.com/ns/def' : 'oth',
}

DATA = '''<?xml version="1.0" encoding="utf-8"?>
<rdc:container xmlns:mx="http://www.adobe.com/2006/mxml"
                 xmlns:rdc="http://www.red-dove.com/ns/abc"
                 xmlns:oth="http://www.red-dove.com/ns/def">
  <mx:Style>
    <oth:style1/>
  </mx:Style>
  <mx:Style>
    <oth:style2/>
  </mx:Style>
  <mx:Style>
    <oth:style3/>
  </mx:Style>
</rdc:container>'''

tree = ET.parse(StringIO(DATA))
some_node = tree.getroot().getchildren()[1]
print ET.fixtag(some_node.tag, NS_MAP)
some_node = some_node.getchildren()[0]
print ET.fixtag(some_node.tag, NS_MAP)

produces

('mx:Style', None)
('oth:style2', None)

Which shows how you can access the fully-qualified tag names of individual nodes in a parsed tree. You should be able to adapt this to your specific needs.




回答2:


xml.etree.ElementTree doesn't appear to have fixtag, well, not according to the documentation. However I've looked at some source code for fixtag and you do:

import xml.etree.ElementTree as ET

for event, elem in ET.iterparse(inFile, events=("start", "end")):
    namespace, looktag = string.split(elem.tag[1:], "}", 1)

You have the tag string in looktag, suitable for a lookup. The namespace is in namespace.



来源:https://stackoverflow.com/questions/1249876/alter-namespace-prefixing-with-elementtree-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!