Saving XML using ETree in Python. It's not retaining namespaces, and adding ns0, ns1 and removing xmlns tags

后端 未结 2 879
轮回少年
轮回少年 2020-12-18 22:22

I see there are similar questions here, but nothing that has totally helped me. I\'ve also looked at the official documentation on namespaces but can\'t find anything that

相关标签:
2条回答
  • 2020-12-18 22:39

    First off, welcome to the StackOverflow network! Technically @anand-s-kumar is correct. However there was a minor misuse of the toString function, and the fact that namespaces might not always be known by the code or the same between tags or XML files. Also, inconsistencies between the lxml and xml.etree libraries and Python 2.x and 3.x make handling this difficult.

    This function iterates through all of the children elements in the XML tree tree that is passed in, and then edits the XML tags to remove the namespaces. Note that by doing this, some data may be lost.

    def remove_namespaces(tree):
        for el in tree.getiterator():
            match = re.match("^(?:\{.*?\})?(.*)$", el.tag)
            if match:
                el.tag = match.group(1)
    

    I myself just ran into this problem, and hacked together a quick solution. I tested this on about 81,000 XML files (averaging around 150 MB each) that had this problem, and all of them were fixed. Note that this isn't exactly an optimal solution, but it is relatively efficient and worked quite well for me.

    CREDIT: Idea and code structure originally from Jochen Kupperschmidt.

    0 讨论(0)
  • 2020-12-18 22:40

    You need to register the prefix and the namespace before you do fromstring() (Reading the xml) to avoid the default namespace prefixes (like ns0 and ns1 , etc.) .

    You can use the ET.register_namespace() function for that, Example -

    ET.register_namespace('<prefix>','http://Test.the.Sdk/2010/07')
    ET.register_namespace('a','http://schema.test.org/2004/07/Test.Soa.Vocab')
    

    You can leave the <prefix> empty if you do not want a prefix.


    Example/Demo -

    >>> r = ET.fromstring('<a xmlns="blah">a</a>')
    >>> ET.tostring(r)
    b'<ns0:a xmlns:ns0="blah">a</ns0:a>'
    >>> ET.register_namespace('','blah')
    >>> r = ET.fromstring('<a xmlns="blah">a</a>')
    >>> ET.tostring(r)
    b'<a xmlns="blah">a</a>'
    
    0 讨论(0)
提交回复
热议问题