I receive xml strings from an external source that can contains unsanitized user contributed content.
The following xml string gave a ParseError in cElementTre
A solution for gottcha for me, using Python's ElementTree... this has the invalid token error:
# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
xml = u"""
Did you verify those street names? """
xmltest = ET.fromstring(xml.encode("utf-8"))
However, it works with the addition of a hyphen in the encoding type:
Most odd. Someone found this footnote in the python docs:
The encoding string included in XML output should conform to the appropriate standards. For example, “UTF-8” is valid, but “UTF8” is not.