Keep lxml from creating self-closing tags

白昼怎懂夜的黑 提交于 2020-07-20 10:38:11

问题


I have a (old) tool which does not understand self-closing tags like <STATUS/>. So, we need to serialize our XML files with opened/closed tags like this: <STATUS></STATUS>.

Currently I have:

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS/>.</ERROR>'

How can I serialize with opened/closed tags?

<ERROR>The status is <STATUS></STATUS>.</ERROR>

Solution

Given by wildwilhelm, below:

>>> from lxml import etree

>>> para = """<ERROR>The status is <STATUS></STATUS>.</ERROR>"""
>>> tree = etree.XML(para)
>>> for status_elem in tree.xpath("//STATUS[string() = '']"):
...     status_elem.text = ""
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

回答1:


It seems like the <STATUS> tag gets assigned a text attribute of None:

>>> tree[0]
<Element STATUS at 0x11708d4d0>
>>> tree[0].text
>>> tree[0].text is None
True

If you set the text attribute of the <STATUS> tag to an empty string, you should get what you're looking for:

>>> tree[0].text = ''
>>> etree.tostring(tree)
'<ERROR>The status is <STATUS></STATUS>.</ERROR>'

With this is mind, you can probably walk a DOM tree and fix up text attributes before writing out your XML. Something like this:

# prevent creation of self-closing tags
for node in tree.iter():
    if node.text is None:
        node.text = ''



回答2:


If you tostring lxml dom is HTML, you can use

etree.tostring(html_dom, method='html')

to prevent self-closing tag like <a />



来源:https://stackoverflow.com/questions/41890415/keep-lxml-from-creating-self-closing-tags

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!