I have an XML file in the following format
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar>
<bat>1</bat>
</bar>
<a>
<b xmlns="urn:schemas-microsoft-com:asm.v1">
<c>1</c>
</b>
</a>
</foo>
I want to change the value of bat to '2' and change the file to this:
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar>
<bat>2</bat>
</bar>
<a>
<b xmlns="urn:schemas-microsoft-com:asm.v1">
<c>1</c>
</b>
</a>
</foo>
I open this file by doing this
tree = ET.parse(filePath)
root = tree.getroot()
I then change the value of bat to '2' and save the file like this:
tree.write(filePath, "utf-8", True, None, "xml")
The value of bat successfully changes to 2, but the XML file now looks like this.
<?xml version="1.0" encoding="utf-8"?>
<foo xmlns:ns0="urn:schemas-microsoft-com:asm.v1">
<bar>
<bat>2</bat>
</bar>
<a>
<ns0:b>
<ns0:c>1</ns0:c>
</ns0:b>
</a>
</foo>
In order to fix the issue of having a namespace named ns0, I do the following before parsing the document
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
This gets rid of the ns0 namepace but the xml file now looks like this
<?xml version="1.0" encoding="utf-8"?>
<foo xmlns="urn:schemas-microsoft-com:asm.v1">
<bar>
<bat>2</bat>
</bar>
<a>
<b>
<c>1</c>
</b>
</a>
</foo>
What do I do to get the output I need?
As far as i know there isn't a way by the means of xml.etree.ElementTree
methods to achieve your goal. By digging in the xml.etree
source code and the xml
specification I found that the library behaviour is not wrong, nor unreasonable. Anyway it does not allows the output you are looking for.
To achieve your goal using that library you have to customize rendering behaviour. To best suite your needs I have written the following render
function.
from xml.etree import ElementTree as ET
from re import findall, sub
def render(root, buffer='', namespaces=None, level=0, indent_size=2, encoding='utf-8'):
buffer += f'<?xml version="1.0" encoding="{encoding}" ?>\n' if not level else ''
root = root.getroot() if isinstance(root, ET.ElementTree) else root
_, namespaces = ET._namespaces(root) if not level else (None, namespaces)
for element in root.iter():
indent = ' ' * indent_size * level
tag = sub(r'({[^}]+}\s*)*', '', element.tag)
buffer += f'{indent}<{tag}'
for ns in findall(r'{[^}]+}', element.tag):
ns_key = ns[1:-1]
if ns_key not in namespaces: continue
buffer += ' xmlns' + (f':{namespaces[ns_key]}' if namespaces[ns_key] != '' else '') + f'="{ns_key}"'
del namespaces[ns_key]
for k, v in element.attrib.items():
buffer += f' {k}="{v}"'
buffer += '>' + element.text.strip() if element.text else '>'
children = list(element)
for child in children:
sep = '\n' if buffer[-1] != '\n' else ''
buffer += sep + render(child, level=level+1, indent_size=indent_size, namespaces=namespaces)
buffer += f'{indent}</{tag}>\n' if 0 != len(children) else f'</{tag}>\n'
return buffer
By supplying to the above render()
function your xml
input data as follows:
data =\
'''<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar>
<bat>1</bat>
</bar>
<a>
<b xmlns="urn:schemas-microsoft-com:asm.v1">
<c>1</c>
</b>
</a>
</foo>'''
root = ET.ElementTree(ET.fromstring(data))
ET.register_namespace('', "urn:schemas-microsoft-com:asm.v1")
print(render(root))
It prints out the output your are looking for:
<?xml version="1.0" encoding="utf-8" ?>
<foo>
<bar>
<bat>1</bat>
</bar>
<a>
<b xmlns="urn:schemas-microsoft-com:asm.v1">
<c>1</c>
</b>
</a>
</foo>
来源:https://stackoverflow.com/questions/38663191/keep-existing-namespaces-when-overwriting-xml-file-with-elementtree-and-python