问题
I have an xml-document that looks like this:
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://someurl/Oldschema"
xsi:schemaLocation="http://someurl/Oldschema Oldschema.xsd"
xmlns:framework="http://someurl/Oldframework">
<framework:tag1> ... </framework:tag1>
<framework:tag2> <tagA> ... </tagA> </framwork:tag2>
</root>
All I want to do is change http://someurl/Oldschema
to http://someurl/Newschema
and http://someurl/Oldframework
to http://someurl/Newframework
and leave the remaining document unchanged. With some insights from this thread lxml: add namespace to input file, I tried the following:
def fix_nsmap(nsmap, tag):
"""update the old nsmap-dict with the new schema-urls. Example:
fix_nsmap({"framework": "http://someurl/Oldframework",
None: "http://someurl/Oldschema"}) ==
{"framework": "http://someurl/Newframework",
None: "http://someurl/Newschema"}"""
...
from lxml import etree
root = etree.parse(XMLFILE).getroot()
root_tag = root.tag.split("}")[1]
nsmap = fix_nsmap(root.nsmap)
new_root = etree.Element(root_tag, nsmap=nsmap)
new_root[:] = root[:]
# ... fix xsi:schemaLocation
return etree.tostring(new_root, pretty_print=True, encoding="UTF-8",
xml_declaration=True)
This produces the right 'attributes' in the root-tag but completely fails for the rest of the document:
<network xmlns:framework="http://someurl/Newframework"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://someurl/Newschema"
xsi:schemaLocation="http://someurl/Newschema Schema.xsd">
<ns0:tag1 xmlns:ns0="http://someurl/Oldframework"> ... </ns0:information>
<ns1:tag2 xmlns:ns1="http://someurl/Oldframework"
xmlns:ns2="http://someurl/Oldschema">
<ns2:tagA> ... </ns2:tagA>
</ns1:tag2>
What is wrong with my approach? Is there any other way to change the namespaces? Maybe I could use xslt?
Thanks!
Denis
回答1:
All I want to do is change
http://someurl/Oldschema
tohttp://someurl/Newschema
andhttp://someurl/Oldframework
tohttp://someurl/Newframework
and leave the remaining document unchanged.
I'd do a simple textual search-and-replace operation. It's much easier than fiddling with XML nodes. Like this:
with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
data = infile.read()
data = data.replace("http://someurl/Oldschema", "http://someurl/Newschema")
data = data.replace("http://someurl/Oldframework", "http://someurl/Newframework")
outfile.write(data)
The other question that you were inspired by is about adding a new namespace (and keeping the old ones). But you are trying to modify existing namespace declarations. Creating a new root element and copying the child nodes does not work in this case.
This line:
new_root[:] = root[:]
turns the children of the original root element into children of the new root element. But these child nodes are still associated with the old namespaces. So they have to be modified/recreated too. I guess it might be possible to come up with a reasonable way to do that, but I don't think you need it. Textual search-and-replace is good enough, IMHO.
来源:https://stackoverflow.com/questions/20947162/modify-namespaces-in-a-given-xml-document-with-lxml