How to find all comments with Beautiful Soup

别来无恙 提交于 2019-12-17 06:45:56

问题


This question was asked four years ago, but the answer is now out of date for BS4.

I want to delete all comments in my html file using beautiful soup. Since BS4 makes each comment as a special type of navigable string, I thought this code would work:

for comments in soup.find_all('comment'):
     comments.decompose()

So that didn't work.... How do I find all comments using BS4?


回答1:


You can pass a function to find_all() to help it check whether the string is a Comment.

For example I have below html:

<body>
   <!-- Branding and main navigation -->
   <div class="Branding">The Science &amp; Safety Behind Your Favorite Products</div>
   <div class="l-branding">
      <p>Just a brand</p>
   </div>
   <!-- test comment here -->
   <div class="block_content">
      <a href="https://www.google.com">Google</a>
   </div>
</body>

Code:

from bs4 import BeautifulSoup as BS
from bs4 import Comment
....
soup = BS(html, 'html.parser')
comments = soup.find_all(string=lambda text: isinstance(text, Comment))
for c in comments:
    print(c)
    print("===========")
    c.extract()

the output would be:

Branding and main navigation 
============
test comment here
============

BTW, I think the reason why find_all('Comment') doesn't work is (from BeautifulSoup document):

Pass in a value for name and you’ll tell Beautiful Soup to only consider tags with certain names. Text strings will be ignored, as will tags whose names that don’t match.




回答2:


Two things I needed to do:

First, when importing Beautiful Soup

from bs4 import BeautifulSoup, Comment

Second, here's the code to extract comments

for comments in soup.findAll(text=lambda text:isinstance(text, Comment)):
    comments.extract()


来源:https://stackoverflow.com/questions/33138937/how-to-find-all-comments-with-beautiful-soup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!