Equivalent to InnerHTML when using lxml.html to parse HTML

前端 未结 4 1859
无人共我
无人共我 2020-12-01 15:52

I\'m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.

4条回答
  •  感情败类
    2020-12-01 16:44

    You can get the children of an ElementTree node using the getchildren() or iterdescendants() methods of the root node:

    >>> from lxml import etree
    >>> from cStringIO import StringIO
    >>> t = etree.parse(StringIO("""
    ... 

    A title

    ...

    Some text

    ... """)) >>> root = t.getroot() >>> for child in root.iterdescendants(),: ... print etree.tostring(child) ...

    A title

    Some text

    This can be shorthanded as follows:

    print ''.join([etree.tostring(child) for child in root.iterdescendants()])
    

提交回复
热议问题