Accessing XMLNS attribute with Python Elementree?

后端 未结 3 1637
无人及你
无人及你 2020-12-08 21:29

How can one access NS attributes through using ElementTree?

With the following:



        
3条回答
  •  北荒
    北荒 (楼主)
    2020-12-08 21:57

    Look at the effbot namespaces documentation/examples; specifically the parse_map function. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.

    However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.

    Here's what I came up with:

    import elementtree.ElementTree as ET
    
    def parse_and_get_ns(file):
        events = "start", "start-ns"
        root = None
        ns = {}
        for event, elem in ET.iterparse(file, events):
            if event == "start-ns":
                if elem[0] in ns and ns[elem[0]] != elem[1]:
                    # NOTE: It is perfectly valid to have the same prefix refer
                    #     to different URI namespaces in different parts of the
                    #     document. This exception serves as a reminder that this
                    #     solution is not robust.    Use at your own peril.
                    raise KeyError("Duplicate prefix with different URI found.")
                ns[elem[0]] = "{%s}" % elem[1]
            elif event == "start":
                if root is None:
                    root = elem
        return ET.ElementTree(root), ns
    

    With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):

    
    
    
      
        Foo
        Joe McGroin
        etc...
      
    
    
    

    You will be able to use the xml namepaces and get info for elements like dc:creator:

    >>> tree, ns = parse_and_get_ns("my.xml")
    >>> ns
    {u'content': '{http://purl.org/rss/1.0/modules/content/}',
    u'dc': '{http://purl.org/dc/elements/1.1/}'}
    >>> item = tree.find("/feed/item")
    >>> item.findtext(ns['dc']+"creator")
    'Joe McGroin'
    

提交回复
热议问题