How can one access NS attributes through using ElementTree?
With the following:
When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated..
How can one access NS attributes through using ElementTree?
With the following:
When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated..
I think element.tag
is what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example.
>>> from xml.etree import ElementTree as ET >>> data = '''''' >>> element = ET.fromstring(data) >>> element >>> element.tag '{http://www.foo.net/a}data' >>> element.attrib {'category': 'ABS', 'date': '2009-12-22', 'book': '1'}
If you just want to know the xmlns URI, you can split it out with a function like:
def tag_uri_and_name(elem): if elem.tag[0] == "{": uri, ignore, tag = elem.tag[1:].partition("}") else: uri = None tag = elem.tag return uri, tag
For much more on namespaces and qualified names in ElementTree, see effbot's examples.
Look at the effbot namespaces documentation/examples; specifically the parse_map function. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.
However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.
Here's what I came up with:
import elementtree.ElementTree as ET def parse_and_get_ns(file): events = "start", "start-ns" root = None ns = {} for event, elem in ET.iterparse(file, events): if event == "start-ns": if elem[0] in ns and ns[elem[0]] != elem[1]: # NOTE: It is perfectly valid to have the same prefix refer # to different URI namespaces in different parts of the # document. This exception serves as a reminder that this # solution is not robust. Use at your own peril. raise KeyError("Duplicate prefix with different URI found.") ns[elem[0]] = "{%s}" % elem[1] elif event == "start": if root is None: root = elem return ET.ElementTree(root), ns
With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):
Foo Joe McGroin etc...
You will be able to use the xml namepaces and get info for elements like dc:creator:
>>> tree, ns = parse_and_get_ns("my.xml") >>> ns {u'content': '{http://purl.org/rss/1.0/modules/content/}', u'dc': '{http://purl.org/dc/elements/1.1/}'} >>> item = tree.find("/feed/item") >>> item.findtext(ns['dc']+"creator") 'Joe McGroin'