Python strip XML tags from document

前端 未结 3 528
青春惊慌失措
青春惊慌失措 2020-12-19 00:44

I am trying to strip XML tags from a document using Python, a language I am a novice in. Here is my first attempt using regex, whixh was really a hope-for-the-best idea.

3条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-19 01:23

    The most reliable way to do this is probably with LXML.

    from lxml import etree
    ...
    tree = etree.parse('somefile.xml')
    notags = etree.tostring(tree, encoding='utf8', method='text')
    print(notags)
    

    It will avoid the problems with "parsing" XML with regular expressions, and should correctly handle escaping and everything.

提交回复
热议问题