I am trying to strip XML tags from a document using Python, a language I am a novice in. Here is my first attempt using regex, whixh was really a hope-for-the-best idea.
The most reliable way to do this is probably with LXML.
from lxml import etree
...
tree = etree.parse('somefile.xml')
notags = etree.tostring(tree, encoding='utf8', method='text')
print(notags)
It will avoid the problems with "parsing" XML with regular expressions, and should correctly handle escaping and everything.