I try to get the whole content between an opening xml tag and it\'s closing counterpart.
Getting the content in straight cases like title below is easy
Here's something that works for me and your sample:
from lxml import etree
doc = etree.XML(
"""
Some testing stuff
Some text with data in it.
"""
)
def flatten(seq):
r = []
for item in seq:
if isinstance(item,(str,unicode)):
r.append(unicode(item))
elif isinstance(item,(etree._Element,)):
r.append(etree.tostring(item,with_tail=False))
return u"".join(r)
print flatten(doc.xpath('/review/text/node()'))
Yields:
Some text with data in it.
The xpath selects all child nodes of the element and either renders them to unicode directly if they are a string/unicode subclass () or calls etree.tostring on it if it's an Element, with_tail=False avoids duplication of the tail.
You may need to handle other node types if they are present.