Beautifulsoup 4: Remove comment tag and its content

前端 未结 3 1395
挽巷
挽巷 2020-12-31 04:35

So the page that I\'m scrapping contains these html codes. How do I remove the comment tag along with its content with bs4?

3条回答
  •  难免孤独
    2020-12-31 05:34

    Usually modifying the bs4 parse tree is unnecessary. You can just get the div's text, if that's what you wanted:

    soup.body.div.text
    Out[18]: '\ncat dog sheep goat\n\n'
    

    bs4 separates out the comment. However if you really need to modify the parse tree:

    from bs4 import Comment
    
    for child in soup.body.div.children:
        if isinstance(child,Comment):
            child.extract()
    

提交回复
热议问题