I see there are similar questions here, but nothing that has totally helped me. I\'ve also looked at the official documentation on namespaces but can\'t find anything that
First off, welcome to the StackOverflow network! Technically @anand-s-kumar is correct. However there was a minor misuse of the toString
function, and the fact that namespaces might not always be known by the code or the same between tags or XML files. Also, inconsistencies between the lxml
and xml.etree
libraries and Python 2.x and 3.x make handling this difficult.
This function iterates through all of the children elements in the XML tree tree
that is passed in, and then edits the XML tags to remove the namespaces. Note that by doing this, some data may be lost.
def remove_namespaces(tree):
for el in tree.getiterator():
match = re.match("^(?:\{.*?\})?(.*)$", el.tag)
if match:
el.tag = match.group(1)
I myself just ran into this problem, and hacked together a quick solution. I tested this on about 81,000 XML files (averaging around 150 MB each) that had this problem, and all of them were fixed. Note that this isn't exactly an optimal solution, but it is relatively efficient and worked quite well for me.
CREDIT: Idea and code structure originally from Jochen Kupperschmidt.
You need to register the prefix and the namespace before you do fromstring()
(Reading the xml) to avoid the default namespace prefixes (like ns0
and ns1
, etc.) .
You can use the ET.register_namespace() function for that, Example -
ET.register_namespace('<prefix>','http://Test.the.Sdk/2010/07')
ET.register_namespace('a','http://schema.test.org/2004/07/Test.Soa.Vocab')
You can leave the <prefix>
empty if you do not want a prefix.
Example/Demo -
>>> r = ET.fromstring('<a xmlns="blah">a</a>')
>>> ET.tostring(r)
b'<ns0:a xmlns:ns0="blah">a</ns0:a>'
>>> ET.register_namespace('','blah')
>>> r = ET.fromstring('<a xmlns="blah">a</a>')
>>> ET.tostring(r)
b'<a xmlns="blah">a</a>'