Faithfully Preserve Comments in Parsed XML

前端 未结 4 1912
甜味超标
甜味超标 2020-12-03 10:32

I\'d like to preserve comments as faithfully as possible while manipulating XML.

I managed to preserve comments, but the contents are getting XML-escaped.



        
相关标签:
4条回答
  • 2020-12-03 11:01

    Looks like both answers from @Martin and @sukhbinder didn't work for me... So made this as a workable completed solution on python 3.x

    from xml.etree import ElementTree
    
    string = '''<?xml version="1.0"?>
    <data>
        <!--Test
        -->
        <country name="Liechtenstein">
            <rank>1</rank>
            <year>2008</year>
            <gdppc>141100</gdppc>
            <neighbor name="Austria" direction="E"/>
            <neighbor name="Switzerland" direction="W"/>
        </country>
    </data>'''
    
    class CommentedTreeBuilder(ElementTree.TreeBuilder):
        def comment(self, data):
            self.start(ElementTree.Comment, {})
            self.data(data)
            self.end(ElementTree.Comment)
    
    parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
    tree = ElementTree.fromstring(string, parser)
    print(tree.find("./*[0]").text)
    # or ElementTree.parse(filename, parser)
    
    0 讨论(0)
  • 2020-12-03 11:03

    Tested with Python 2.7 and 3.5, the following code should work as intended.

    #!/usr/bin/env python
    # CommentedTreeBuilder.py
    from xml.etree import ElementTree
    
    class CommentedTreeBuilder(ElementTree.TreeBuilder):
        def comment(self, data):
            self.start(ElementTree.Comment, {})
            self.data(data)
            self.end(ElementTree.Comment)
    

    Then, in the main code use

    parser = ElementTree.XMLParser(target=CommentedTreeBuilder())
    

    as the parser instead of the current one.

    By the way, comments work correctly out of the box with lxml. That is, you can just do

    import lxml.etree as ET
    tree = ET.parse(filename)
    

    without needing any of the above.

    0 讨论(0)
  • 2020-12-03 11:14

    Martin's Code didn't work for me. I modified the same with the following which works as intended.

    import xml.etree.ElementTree as ET
    
    class CommentedTreeBuilder(ET.XMLTreeBuilder):
        def __init__(self, *args, **kwargs):
            super(CommentedTreeBuilder, self).__init__(*args, **kwargs)
            self._parser.CommentHandler = self.comment
    
        def comment(self, data):
            self._target.start(ET.Comment, {})
            self._target.data(data)
            self._target.end(ET.Comment)
    

    This is the test

        parser=CommentedTreeBuilder()
        tree = ET.parse(filename, parser)
        tree.write('out.xml')
    
    0 讨论(0)
  • 2020-12-03 11:18

    Python 3.8 added the insert_comments argument to TreeBuilder which:

    class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)

    When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).

    Example:

    parser = ElementTree.XMLParser(target=ElementTree.TreeBuilder(insert_comments=True))
    
    0 讨论(0)
提交回复
热议问题