发表新帖

发表新帖

Beautifulsoup 4: Remove comment tag and its content

前端未结

关注

 3  1406

挽巷 2020-12-31 04:35

So the page that I\'m scrapping contains these html codes. How do I remove the comment tag along with its content with bs4?

3条回答

难免孤独 (楼主)

2020-12-31 05:34
Usually modifying the bs4 parse tree is unnecessary. You can just get the div's text, if that's what you wanted:
```
soup.body.div.text
Out[18]: '\ncat dog sheep goat\n\n'
```
bs4 separates out the comment. However if you really need to modify the parse tree:
```
from bs4 import Comment

for child in soup.body.div.children:
    if isinstance(child,Comment):
        child.extract()
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题