Modify namespaces in a given xml document with lxml

廉价感情. 提交于 2019-12-18 09:33:41

问题


I have an xml-document that looks like this:

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns="http://someurl/Oldschema"
     xsi:schemaLocation="http://someurl/Oldschema Oldschema.xsd"
     xmlns:framework="http://someurl/Oldframework">
   <framework:tag1> ... </framework:tag1>
   <framework:tag2> <tagA> ... </tagA> </framwork:tag2>
</root>

All I want to do is change http://someurl/Oldschema to http://someurl/Newschema and http://someurl/Oldframework to http://someurl/Newframework and leave the remaining document unchanged. With some insights from this thread lxml: add namespace to input file, I tried the following:

def fix_nsmap(nsmap, tag):
    """update the old nsmap-dict with the new schema-urls. Example:
    fix_nsmap({"framework": "http://someurl/Oldframework",
               None: "http://someurl/Oldschema"}) ==
      {"framework": "http://someurl/Newframework",
       None: "http://someurl/Newschema"}"""
    ...

from lxml import etree
root = etree.parse(XMLFILE).getroot()
root_tag = root.tag.split("}")[1]
nsmap = fix_nsmap(root.nsmap)
new_root = etree.Element(root_tag, nsmap=nsmap)
new_root[:] = root[:]
# ... fix xsi:schemaLocation
return etree.tostring(new_root, pretty_print=True, encoding="UTF-8",
    xml_declaration=True) 

This produces the right 'attributes' in the root-tag but completely fails for the rest of the document:

<network xmlns:framework="http://someurl/Newframework"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="http://someurl/Newschema"
    xsi:schemaLocation="http://someurl/Newschema Schema.xsd">
<ns0:tag1 xmlns:ns0="http://someurl/Oldframework"> ... </ns0:information>
<ns1:tag2 xmlns:ns1="http://someurl/Oldframework"
          xmlns:ns2="http://someurl/Oldschema">
    <ns2:tagA> ... </ns2:tagA>
</ns1:tag2>

What is wrong with my approach? Is there any other way to change the namespaces? Maybe I could use xslt?

Thanks!

Denis


回答1:


All I want to do is change http://someurl/Oldschema to http://someurl/Newschema and http://someurl/Oldframework to http://someurl/Newframework and leave the remaining document unchanged.

I'd do a simple textual search-and-replace operation. It's much easier than fiddling with XML nodes. Like this:

with open("input.xml", "r") as infile, open("output.xml", "w") as outfile:
    data = infile.read()
    data = data.replace("http://someurl/Oldschema", "http://someurl/Newschema")
    data = data.replace("http://someurl/Oldframework", "http://someurl/Newframework")
    outfile.write(data)

The other question that you were inspired by is about adding a new namespace (and keeping the old ones). But you are trying to modify existing namespace declarations. Creating a new root element and copying the child nodes does not work in this case.

This line:

new_root[:] = root[:]

turns the children of the original root element into children of the new root element. But these child nodes are still associated with the old namespaces. So they have to be modified/recreated too. I guess it might be possible to come up with a reasonable way to do that, but I don't think you need it. Textual search-and-replace is good enough, IMHO.



来源:https://stackoverflow.com/questions/20947162/modify-namespaces-in-a-given-xml-document-with-lxml

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!