Accessing XMLNS attribute with Python Elementree?

后端 未结 3 1595
无人及你
无人及你 2020-12-08 21:29

How can one access NS attributes through using ElementTree?

With the following:



        
相关标签:
3条回答
  • 2020-12-08 21:35

    I think element.tag is what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example.

    >>> from xml.etree import ElementTree as ET
    >>> data = '''<data xmlns="http://www.foo.net/a"
    ...                 xmlns:a="http://www.foo.net/a"
    ...                 book="1" category="ABS" date="2009-12-22"/>'''
    >>> element = ET.fromstring(data)
    >>> element
    <Element {http://www.foo.net/a}data at 1013b74d0>
    >>> element.tag
    '{http://www.foo.net/a}data'
    >>> element.attrib
    {'category': 'ABS', 'date': '2009-12-22', 'book': '1'}
    

    If you just want to know the xmlns URI, you can split it out with a function like:

    def tag_uri_and_name(elem):
        if elem.tag[0] == "{":
            uri, ignore, tag = elem.tag[1:].partition("}")
        else:
            uri = None
            tag = elem.tag
        return uri, tag
    

    For much more on namespaces and qualified names in ElementTree, see effbot's examples.

    0 讨论(0)
  • 2020-12-08 21:38

    Try this:

    import xml.etree.ElementTree as ET
    import re
    import sys
    
    with open(sys.argv[1]) as f:
        root = ET.fromstring(f.read())
        xmlns = ''
        m = re.search('{.*}', root.tag)
        if m:
            xmlns = m.group(0)
        print(root.find(xmlns + 'the_tag_you_want').text)
    
    0 讨论(0)
  • 2020-12-08 21:57

    Look at the effbot namespaces documentation/examples; specifically the parse_map function. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.

    However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.

    Here's what I came up with:

    import elementtree.ElementTree as ET
    
    def parse_and_get_ns(file):
        events = "start", "start-ns"
        root = None
        ns = {}
        for event, elem in ET.iterparse(file, events):
            if event == "start-ns":
                if elem[0] in ns and ns[elem[0]] != elem[1]:
                    # NOTE: It is perfectly valid to have the same prefix refer
                    #     to different URI namespaces in different parts of the
                    #     document. This exception serves as a reminder that this
                    #     solution is not robust.    Use at your own peril.
                    raise KeyError("Duplicate prefix with different URI found.")
                ns[elem[0]] = "{%s}" % elem[1]
            elif event == "start":
                if root is None:
                    root = elem
        return ET.ElementTree(root), ns
    

    With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):

    <?xml version="1.0" encoding="UTF-8" ?>
    <rss version="2.0"
    xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"\
    >
    <feed>
      <item>
        <title>Foo</title>
        <dc:creator>Joe McGroin</dc:creator>
        <description>etc...</description>
      </item>
    </feed>
    </rss>
    

    You will be able to use the xml namepaces and get info for elements like dc:creator:

    >>> tree, ns = parse_and_get_ns("my.xml")
    >>> ns
    {u'content': '{http://purl.org/rss/1.0/modules/content/}',
    u'dc': '{http://purl.org/dc/elements/1.1/}'}
    >>> item = tree.find("/feed/item")
    >>> item.findtext(ns['dc']+"creator")
    'Joe McGroin'
    
    0 讨论(0)
提交回复
热议问题