Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

后端 未结 5 1560
执笔经年
执笔经年 2021-02-20 14:45

I\'m trying to parse, manipulate, and output HTML using Python\'s ElementTree:

import sys
from cStringIO  import StringIO
from xml.etree  import ElementTree as E         


        
5条回答
  •  难免孤独
    2021-02-20 15:37

    HTML is not the same as XML, so tags like   will not work. Ideally, if you are trying to pass that information via XML, you could first xml-encode the above data, so it would look something like this:

    
    
    <htm>
    <body>
    <p>Less than &lt;</p>
    <p>Non-breaking space &nbsp;</p>
    </body>
    </html>
    
    
    

    And then after parsing the XML you can HTML-unencode the string.

提交回复
热议问题