Why does lxml.html sometimes swallow/remove whitespace instead of preserving it?
问题 Given the following code, one might reasonably expect almost the exact same string of HTML that was fed into lxml to be to spit back out. from lxml import html HTML_TEST_STRING = r""" <pre> <em>abc</em> <em>def</em> <sub>ghi</sub> <sub>jkl</sub> <em>mno</em> <em>pqr</em> </pre> """ parser = html.HTMLParser( remove_blank_text=False ) doc = html.fromstring( HTML_TEST_STRING, parser=parser ) print( html_out_string ) Instead, even though everything is contained within a <pre> pre-formatted code