Faithfully Preserve Comments in Parsed XML (Python 2.7)

流过昼夜 提交于 2019-11-29 01:38:28

Tested with Python 2.7 and 3.5, the following code should work as intended.

#!/usr/bin/env python
# CommentedTreeBuilder.py
from xml.etree import ElementTree

class CommentedTreeBuilder(ElementTree.TreeBuilder):
    def comment(self, data):
        self.start(ElementTree.Comment, {})
        self.data(data)
        self.end(ElementTree.Comment)

Then, in the main code use

parser = ET.XMLParser(target=CommentedTreeBuilder())

as the parser instead of the current one.

By the way, comments work correctly out of the box with lxml. That is, you can just do

import lxml.etree as ET
tree = ET.parse(filename)

without needing any of the above.

Martin's Code didn't work for me. I modified the same with the following which works as intended.

import xml.etree.ElementTree as ET

class CommentedTreeBuilder(ET.XMLTreeBuilder):
    def __init__(self, *args, **kwargs):
        super(CommentedTreeBuilder, self).__init__(*args, **kwargs)
        self._parser.CommentHandler = self.comment

    def comment(self, data):
        self._target.start(ET.Comment, {})
        self._target.data(data)
        self._target.end(ET.Comment)

This is the test

    parser=CommentedTreeBuilder()
    tree = ET.parse(filename, parser)
    tree.write('out.xml')

Looks like both answers from @Martin and @sukhbinder didn't work for me... So made this as a workable completed solution on python 3.x

from xml.etree import ElementTree

string = '''<?xml version="1.0"?>
<data>
    <!--Test
    -->
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
</data>'''

class CommentedTreeBuilder(ElementTree.TreeBuilder):
    def comment(self, data):
        self.start(ElementTree.Comment, {})
        self.data(data)
        self.end(ElementTree.Comment)

parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
tree = ElementTree.fromstring(string, parser)
print(tree.find("./*[0]").text)
# or ElementTree.parse(filename, parser)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!