Equivalent to InnerHTML when using lxml.html to parse HTML

前端 未结 4 1844
无人共我
无人共我 2020-12-01 15:52

I\'m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.

4条回答
  •  一生所求
    2020-12-01 16:29

    Sorry for bringing this up again, but I've been looking for a solution and yours contains a bug:

    This text is ignored
    

    Title

    Some text

    Text directly under the root element is ignored. I ended up doing this:

    (body.text or '') +\
    ''.join([html.tostring(child) for child in body.iterchildren()])
    

提交回复
热议问题