Python etree control empty tag format

前端 未结 5 1587
小蘑菇
小蘑菇 2020-12-06 10:32

When creating an XML file with Python\'s etree, if we write to the file an empty tag using SubElement, I get:


相关标签:
5条回答
  • 2020-12-06 11:16

    Adding an empty text is another option:

    etree.SubElement(parent, 'child_tag_name').text=''
    

    But note that this will change not only the representation but also the structure of the document: i.e. child_el.text will be '' instead of None.

    Oh, and like Martijn said, try to use better libraries.

    0 讨论(0)
  • 2020-12-06 11:19

    Paraphrasing the code, the version of ElementTree.py I use contains the following in a _write method:

    write('<' + tagname)
    ...
    if node.text or len(node): # this line is literal
        write('>')
        ...
        write('</%s>' % tagname)
    else:
        write(' />')
    

    To steer the program counter I created the following:

    class AlwaysTrueString(str):
        def __nonzero__(self): return True
    true_empty_string = AlwaysTrueString()
    

    Then I set node.text = true_empty_string on those ElementTree nodes where I want an open-close tag rather than a self-closing one.

    By "steering the program counter" I mean constructing a set of inputs—in this case an object with a somewhat curious truth test—to a library method such that the invocation of the library method traverses its control flow graph the way I want it to. This is ridiculously brittle: in a new version of the library, my hack might break—and you should probably treat "might" as "almost guaranteed". In general, don't break abstraction barriers. It just worked for me here.

    0 讨论(0)
  • 2020-12-06 11:23

    This was directly solved in Python 3.4. From then, the write method of xml.etree.ElementTree.ElementTree has the short_empty_elements parameter which:

    controls the formatting of elements that contain no content. If True (the default), they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags.

    More details in the xml.etree documentation.

    0 讨论(0)
  • 2020-12-06 11:25

    If you have sed available, you could pipe the output of your python script to

    sed -e "s/<\([^>]*\) \/>/<\1><\/\1>/g"
    

    Which will find any occurence of <Tag /> and replace it by <Tag></Tag>

    0 讨论(0)
  • 2020-12-06 11:34

    As of Python 3.4, you can use the short_empty_elements argument for both the tostring() function and the ElementTRee.write() method:

    >>> from xml.etree import ElementTree as ET
    >>> ET.tostring(ET.fromstring('<mytag/>'), short_empty_elements=False)
    b'<mytag></mytag>'
    

    In older Python versions, (2.7 through to 3.3), as a work-around you can use the html method to write out the document:

    >>> from xml.etree import ElementTree as ET
    >>> ET.tostring(ET.fromstring('<mytag/>'), method='html')
    '<mytag></mytag>'
    

    Both the ElementTree.write() method and the tostring() function support the method keyword argument.

    On even earlier versions of Python (2.6 and before) you can install the external ElementTree library; version 1.3 supports that keyword.

    Yes, it sounds a little weird, but the html output mostly outputs empty elements as a start and end tag. Some elements still end up as empty tag elements; specifically <link/>, <input/>, <br/> and such. Still, it's that or upgrade your Fortran XML parser to actually parse standards-compliant XML!

    0 讨论(0)
提交回复
热议问题