Equivalent to InnerHTML when using lxml.html to parse HTML

前端未结

关注

 4  1853

无人共我 2020-12-01 15:52

I\'m working on a script using lxml.html to parse web pages. I have done a fair bit of BeautifulSoup in my time but am now experimenting with lxml due to its speed.

4条回答

一生所求 (楼主)

2020-12-01 16:29
Sorry for bringing this up again, but I've been looking for a solution and yours contains a bug:
```
This text is ignored
Title
Some text
```
Text directly under the root element is ignored. I ended up doing this:
```
(body.text or '') +\
''.join([html.tostring(child) for child in body.iterchildren()])
```
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

Equivalent to InnerHTML when using lxml.html to parse HTML

Title