ParseError: not well-formed (invalid token) using cElementTree

前端 未结 13 1089
日久生厌
日久生厌 2020-12-16 11:10

I receive xml strings from an external source that can contains unsanitized user contributed content.

The following xml string gave a ParseError in cElementTre

13条回答
  •  南笙
    南笙 (楼主)
    2020-12-16 11:36

    After lots of searching through the entire WWW, I only found out that you have to escape certain characters if you want your XML parser to work! Here's how I did it and worked for me:

    escape_illegal_xml_characters = lambda x: re.sub(u'[\x00-\x08\x0b\x0c\x0e-\x1F\uD800-\uDFFF\uFFFE\uFFFF]', '', x)
    

    And use it like you'd normally do:

    ET.XML(escape_illegal_xml_characters(my_xml_string)) #instead of ET.XML(my_xml_string)
    

提交回复
热议问题