I\'m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.
Sorry for bringing this up again, but I've been looking for a solution and yours contains a bug:
This text is ignored TitleSome text
Some text
Text directly under the root element is ignored. I ended up doing this:
(body.text or '') +\ ''.join([html.tostring(child) for child in body.iterchildren()])