Get all text from an XML document?

前端 未结 5 1247
闹比i
闹比i 2020-12-11 07:17

How can I get all the text content of an XML document, as a single string - like this Ruby/hpricot example but using Python.

I\'d like to replace XML tags with a sin

5条回答
  •  Happy的楠姐
    2020-12-11 07:26

    A solution that doesn't require an external library like BeautifulSoup, using the built-in sax parsing framework:

    from xml import sax
    
    class MyHandler(sax.handler.ContentHandler):
        def parse(self, filename):
            self.text = []
            sax.parse(filename, self)
            return ''.join(self.text)
    
        def characters(self, data):
            self.text.append(data)
    
    result = MyHandler().parse("yourfile.xml")
    

    If you need all whitespace intact in the text, also define the ignorableWhitespace method in the handler class in the same way characters is defined.

提交回复
热议问题