Python: ElementTree, get the namespace string of an Element

问题

This XML file is named example.xml:

<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

  <modelVersion>14.0.0</modelVersion>
  <groupId>.com.foobar.flubber</groupId>
  <artifactId>uberportalconf</artifactId>
  <version>13-SNAPSHOT</version>
  <packaging>pom</packaging>
  <name>Environment for UberPortalConf</name>
  <description>This is the description</description>    
  <properties>
      <birduberportal.version>11</birduberportal.version>
      <promotiondevice.version>9</promotiondevice.version>
      <foobarportal.version>6</foobarportal.version>
      <eventuberdevice.version>2</eventuberdevice.version>
  </properties>
  <!-- A lot more here, but as it is irrelevant for the problem I have removed it -->
</project>

If I load example.xml and parse it with ElementTree I can see its namespace is http://maven.apache.org/POM/4.0.0.

>>> from xml.etree import ElementTree
>>> tree = ElementTree.parse('example.xml')
>>> print tree.getroot()
<Element '{http://maven.apache.org/POM/4.0.0}project' at 0x26ee0f0>

I have not found a method to call to get just the namespace from an Element without resorting to parsing the str(an_element) of an Element. It seems like there got to be a better way.

回答1:

The namespace should be in Element.tag right before the "actual" tag:

>>> root = tree.getroot()
>>> root.tag
'{http://maven.apache.org/POM/4.0.0}project'

To know more about namespaces, take a look at ElementTree: Working with Namespaces and Qualified Names.

回答2:

This is a perfect task for a regular expression.

import re

def namespace(element):
    m = re.match(r'\{.*\}', element.tag)
    return m.group(0) if m else ''

回答3:

I am not sure if this is possible with xml.etree, but here is how you could do it with lxml.etree:

>>> from lxml import etree
>>> tree = etree.parse('example.xml')
>>> tree.xpath('namespace-uri(.)')
'http://maven.apache.org/POM/4.0.0'

回答4:

Without using regular expressions:

>>> root
<Element '{http://www.google.com/schemas/sitemap/0.84}urlset' at 0x2f7cc10>

>>> root.tag.split('}')[0].strip('{')
'http://www.google.com/schemas/sitemap/0.84'

回答5:

I think it will be easier to take a look at the attributes:

>>> root.attrib
{'{http://www.w3.org/2001/XMLSchema-instance}schemaLocation':
   'http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd'}

回答6:

The lxml.xtree library's element has a dictionary called nsmap, which shows all the namespace that are in use in the current tag scope.

>>> item = tree.getroot().iter().next()
>>> item.nsmap
{'md': 'urn:oasis:names:tc:SAML:2.0:metadata'}

回答7:

The short answer is:

ElementTree._namspace_map[ElementTree._namspace_map.values().index('')]

but only if you have been calling

ElementTree.register_namespace(prefix,uri)

in response to every event=="start-ns" received while iterating through the result of

ET.iterparse(...)

and you registered for "start-ns"

The answer the question "what is the default namespace?", it is necessary to clarify two points:

(1) XML specifications say that the default namespace is not necessarily global throughout the tree, rather the default namespace can be re-declared at any element under root, and inherits downwards until meeting another default namespace re-declaration.

(2) The ElementTree module can (de facto) handle XML-like documents which have no root default namespace, -if- they have no namespace use anywhere in the document. (* there may be less strict conditions, e.g., that is "if" and not necessarily "iff").

It's probably also worth considering "what do you want it for?" Consider that XML files can be semantically equivalent, but syntactically very different. E.g., the following three files are semantically equivalent, but A.xml has one default namespace declaration, B.xml has three, and C.xml has none.

A.xml:
<a xlmns="http://A" xlmns:nsB0="http://B0" xlmns:nsB1="http://B1">
     <nsB0:b/>
     <nsB1:b/>
</a>

B.xml:
<a xlmns="http://A">
     <b xlmns="http://B0"/>
     <b xlmns="http://B1"/>
</a>

C.xml:
<{http://A}a>
     <{http://B0}b/>
     <{http://B1}b/>
</a>

The file C.xml is the canonical expanded syntactical representation presented to the ElementTree search functions.

If you are certain a priori that there will be no namespace collisions, you can modify the element tags while parsing as discussed here: Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall"

回答8:

combining some of the answers above, I think the shortest code is

theroot = tree.getroot()
theroot.attrib[theroot.keys()[0]]

来源：https://stackoverflow.com/questions/9513540/python-elementtree-get-the-namespace-string-of-an-element

标签

python

elementtree