XML pretty print fails in Python lxml

一个人想着一个人 提交于 2019-12-20 03:42:09

问题


I am trying to read, modify, and write an XML file with lxml 4.1.1 in Python 2.7.6.

My code:

import lxml.etree as et

fn_xml_in = 'in.xml'
parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse(fn_xml_in, parser)
xml_doc.getroot().find('b').append(et.Element('c'))
xml_doc.write('out.xml', method='html', pretty_print=True)

The input file in.xml looks like this:

<a>
    <b/>
</a>

And the produced output file out.xml:

<a>
    <b><c></c></b>
</a>

Or when I set remove_blank_text=True:

<a><b><c></c></b></a>

I would have expected lxml to insert line breaks and indentation within the b element:

<a>
    <b>
        <c></c>
    </b>
</a>

How can I achieve this?

I have tried some tidy lib wrappers, but they seem to specialize on HTML rather than XML.

I have also tried to add newline characters as b's tail, but then even the indentation is broken.

Edit: I need the c element to remain separated in an opening and a closing tag: <c></c>. This is why I use method='HTML' in the example.


回答1:


Use the "xml" output method when writing (that's the default so it does not have to be given explicitly).

Set the text property of the c element to an empty string to ensure that the element gets serialized as <c></c>.

Code:

import lxml.etree as et

parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse('in.xml', parser)

b = xml_doc.getroot().find('b')
c = et.Element('c')
c.text=''
b.append(c)

xml_doc.write('out.xml', pretty_print=True)

Result (out.xml):

<a>
  <b>
    <c></c>
  </b>
</a>



回答2:


Thanks to mzjn's comment, I found a working – but not elegant – solution. Since I need empty elements to remain in HTML syntax, the mere use of method='XML' is not satisfying.

Formatting the document twice yields the desired result:

import lxml.etree as et

parser = et.XMLParser(remove_blank_text=True)
xml_doc = et.parse('in.xml', parser)
xml_doc.getroot().find('b').append(et.Element('c'))
xml_doc.write('out.xml', pretty_print=True)

parser = et.XMLParser(remove_blank_text=False)
xml_doc = et.parse('out.xml', parser)
xml_doc.write('out.xml', pretty_print=True, method='HTML')

results in:

<a>
  <b>
    <c></c>
  </b>
</a>

Not elegant, but working.



来源:https://stackoverflow.com/questions/47791342/xml-pretty-print-fails-in-python-lxml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!