ParseError: not well-formed (invalid token) using cElementTree

前端 未结 13 1060
日久生厌
日久生厌 2020-12-16 11:10

I receive xml strings from an external source that can contains unsanitized user contributed content.

The following xml string gave a ParseError in cElementTre

13条回答
  •  情歌与酒
    2020-12-16 11:31

    A solution for gottcha for me, using Python's ElementTree... this has the invalid token error:

    # -*- coding: utf-8 -*-
    import xml.etree.ElementTree as ET
    
    xml = u"""
    Did you verify those street names?"""
    
    xmltest = ET.fromstring(xml.encode("utf-8"))
    

    However, it works with the addition of a hyphen in the encoding type:

    
    

    Most odd. Someone found this footnote in the python docs:

    The encoding string included in XML output should conform to the appropriate standards. For example, “UTF-8” is valid, but “UTF8” is not.

提交回复
热议问题