Python ElementTree won't convert non-breaking spaces when using UTF-8 for output

后端 未结 5 1561
执笔经年
执笔经年 2021-02-20 14:45

I\'m trying to parse, manipulate, and output HTML using Python\'s ElementTree:

import sys
from cStringIO  import StringIO
from xml.etree  import ElementTree as E         


        
5条回答
  •  梦谈多话
    2021-02-20 15:33

    XML only defines <, >, ', " and &.   and others come from HTML. So you have a couple of choices.

    1. You can change your source to use numeric entities, like   or   both of which are equivalent to  .
    2. You can use a DTD which defines those values.

    There is some useful information (it is written about XSLT, but XSLT is written using XML, so the same applies) at the XSLT FAQ.


    The question appears now to include a stack trace; that changes things. Are you sure that the string is in UTF-8? If it resolves to the single byte 0xA0, then it isn't UTF-8 but more likely cp1252 or iso-8859-1.

提交回复
热议问题