ParseError: not well-formed (invalid token) using cElementTree

前端 未结 13 1082
日久生厌
日久生厌 2020-12-16 11:10

I receive xml strings from an external source that can contains unsanitized user contributed content.

The following xml string gave a ParseError in cElementTre

13条回答
  •  悲哀的现实
    2020-12-16 11:43

    lxml solved the issue, in my case

    from lxml import etree
    
    for _, elein etree.iterparse(xml_file, tag='tag_i_wanted', unicode='utf-8'):
        print(ele.tag, ele.text)  
    

    in another case,

    parser = etree.XMLParser(recover=True)
    tree = etree.parse(xml_file, parser=parser)
    tags_needed = tree.iter('TAG NAME')
    

    Thanks to theeastcoastwest

    Python 2.7

提交回复
热议问题