I see there are similar questions here, but nothing that has totally helped me. I\'ve also looked at the official documentation on namespaces but can\'t find anything that
First off, welcome to the StackOverflow network! Technically @anand-s-kumar is correct. However there was a minor misuse of the toString function, and the fact that namespaces might not always be known by the code or the same between tags or XML files. Also, inconsistencies between the lxml and xml.etree libraries and Python 2.x and 3.x make handling this difficult.
This function iterates through all of the children elements in the XML tree tree that is passed in, and then edits the XML tags to remove the namespaces. Note that by doing this, some data may be lost.
def remove_namespaces(tree):
for el in tree.getiterator():
match = re.match("^(?:\{.*?\})?(.*)$", el.tag)
if match:
el.tag = match.group(1)
I myself just ran into this problem, and hacked together a quick solution. I tested this on about 81,000 XML files (averaging around 150 MB each) that had this problem, and all of them were fixed. Note that this isn't exactly an optimal solution, but it is relatively efficient and worked quite well for me.
CREDIT: Idea and code structure originally from Jochen Kupperschmidt.