Equivalent to InnerHTML when using lxml.html to parse HTML

前端未结

关注

 4  1859

无人共我 2020-12-01 15:52

I\'m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.

4条回答

感情败类 (楼主)

2020-12-01 16:44

You can get the children of an ElementTree node using the getchildren() or iterdescendants() methods of the root node:

>>> from lxml import etree
>>> from cStringIO import StringIO
>>> t = etree.parse(StringIO("""
... A title
... Some text
... """))
>>> root = t.getroot()
>>> for child in root.iterdescendants(),:
...  print etree.tostring(child)
...
A title

Some text

This can be shorthanded as follows:

print ''.join([etree.tostring(child) for child in root.iterdescendants()])

0 讨论(0)

查看其它4个回答