lxml: Force to convert newlines to entities

早过忘川 提交于 2019-12-25 01:54:23

问题


Is there a way to output newlines inside text elements as 
 entities? Currently, newlines are inserted into output as-is:

from lxml import etree
from lxml.builder import E
etree.tostring(E.a('one\ntwo'), pretty_print=True)
b'<a>one\ntwo</a>\n'

Desired output:

b'<a>one&#13;two</a>\n'

回答1:


After looking through the lxml docs, it looks like there is no way to force certain characters to be printed as escaped entities. It also looks like the list of characters that gets escaped varies by the output encoding.

With all of that said, I'd use BeautifulSoup's prettify() on top of lxml to get the job done:

from bs4 import BeautifulSoup as Soup
from xml.sax.saxutils import escape

def extra_entities(s):
    return escape(s).replace('\n', '&#13;')

soup = Soup("<a>one\ntwo</a>", 'lxml-xml')
print(soup.prettify(formatter=extra_entities))

Output:

<?xml version="1.0" encoding="utf-8"?>
<a>
 one&#10;two
</a>

Note that newlines should actually map to &#10; (&#13; is for carriage returns or \r) but I won't argue because I can't test FCPXML format locally.



来源:https://stackoverflow.com/questions/49591782/lxml-force-to-convert-newlines-to-entities

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!