How can I get all the text content of an XML document, as a single string - like this Ruby/hpricot example but using Python.
I\'d like to replace XML tags with a sin
A solution that doesn't require an external library like BeautifulSoup, using the built-in sax parsing framework:
from xml import sax
class MyHandler(sax.handler.ContentHandler):
def parse(self, filename):
self.text = []
sax.parse(filename, self)
return ''.join(self.text)
def characters(self, data):
self.text.append(data)
result = MyHandler().parse("yourfile.xml")
If you need all whitespace intact in the text, also define the ignorableWhitespace method in the handler class in the same way characters is defined.