BeautifulSoup removing tags

試著忘記壹切 提交于 2021-01-05 08:57:35

问题


I'm trying to remove the style tags and their contents from the source, but it's not working, no errors just simply doesn't decompose. This is what I have:

source = BeautifulSoup(open("page.html"))
getbody = source.find('body')
for child in getbody[0].children:
    try:
        if child.get('style') is not None and child.get('style') == "display:none":
            # it in here
            child.decompose()
    except:
        continue
print source
# display:hidden div's are still there.

回答1:


The following code does what you want and works fine; do not use blanket except handling to mask bugs:

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style='display:none'):
    hidden.decompose()

or better still, use a regular expression to cast the net a little wider:

import re

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style=re.compile(r'display:\s*none')):
    hidden.decompose()

Tag.children only lists direct children of the body tag, not all nested children.



来源:https://stackoverflow.com/questions/21654698/beautifulsoup-removing-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!