I receive xml strings from an external source that can contains unsanitized user contributed content.
The following xml string gave a ParseError in cElementTre
After lots of searching through the entire WWW, I only found out that you have to escape certain characters if you want your XML parser to work! Here's how I did it and worked for me:
escape_illegal_xml_characters = lambda x: re.sub(u'[\x00-\x08\x0b\x0c\x0e-\x1F\uD800-\uDFFF\uFFFE\uFFFF]', '', x)
And use it like you'd normally do:
ET.XML(escape_illegal_xml_characters(my_xml_string)) #instead of ET.XML(my_xml_string)