I\'m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.
You can get the children of an ElementTree node using the getchildren() or iterdescendants() methods of the root node:
>>> from lxml import etree
>>> from cStringIO import StringIO
>>> t = etree.parse(StringIO("""
... A title
... Some text
... """))
>>> root = t.getroot()
>>> for child in root.iterdescendants(),:
... print etree.tostring(child)
...
A title
Some text
This can be shorthanded as follows:
print ''.join([etree.tostring(child) for child in root.iterdescendants()])