Multiple conditions in BeautifulSoup: Text=True & IMG Alt=True

萝らか妹 提交于 2021-01-28 05:49:30

问题


is there a way to use multiple conditions in BeautifulSoup?

These are the two conditions I like to use together:

Get text:

soup.find_all(text=True)

Get img alt:

soup.find_all('img', title=True):

I know I can do it separately but I would like to get it together to keep the flow of the HTML.

The reason I'm doing this is because only BeautifulSoup extract the hidden text by css: Display None.

When you use driver.find_element_by_tag_name('body').text you get the img alt att, but unfortunately not the hidden text by css: display:none.

I appreciate your help. Thank you!


回答1:


.find_all() returns only texts or tags, but you can make your own function that returns texts from the soup and text from the alt= attributes.

For example:

from bs4 import BeautifulSoup, Tag, NavigableString


txt = '''
Some text
<img alt="Some alt" src="#" />
Some other text
'''

def traverse(s):
    for c in s.contents:
        if isinstance(c, Tag):
            if c.name == 'img' and 'alt' in c.attrs:
                yield c['alt']
            yield from traverse(c)
        elif isinstance(c, NavigableString):
            yield c


soup = BeautifulSoup(txt, 'html.parser')

for text in traverse(soup):
    print(text.strip())

Prints:

Some text
Some alt
Some other text


来源:https://stackoverflow.com/questions/63424162/multiple-conditions-in-beautifulsoup-text-true-img-alt-true

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!