XML pretty print fails in Python lxml

 ̄綄美尐妖づ 提交于 2019-12-02 02:28:55

Use the "xml" output method when writing (that's the default so it does not have to be given explicitly).

Set the text property of the c element to an empty string to ensure that the element gets serialized as <c></c>.

Code:

import lxml.etree as et

parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse('in.xml', parser)

b = xml_doc.getroot().find('b')
c = et.Element('c')
c.text=''
b.append(c)

xml_doc.write('out.xml', pretty_print=True)

Result (out.xml):

<a>
  <b>
    <c></c>
  </b>
</a>

Thanks to mzjn's comment, I found a working – but not elegant – solution. Since I need empty elements to remain in HTML syntax, the mere use of method='XML' is not satisfying.

Formatting the document twice yields the desired result:

import lxml.etree as et

parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse('in.xml', parser)
xml_doc.getroot().find('b').append(et.Element('c'))
xml_doc.write('out.xml', pretty_print=True)

parser = et.XMLParser(remove_blank_text=False)
xml_doc = et.parse('out.xml', parser)
xml_doc.write('out.xml', pretty_print=True, method='HTML')

results in:

<a>
  <b>
    <c></c>
  </b>
</a>

Not elegant, but working.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!