问题
is there a way to use multiple conditions in BeautifulSoup?
These are the two conditions I like to use together:
Get text:
soup.find_all(text=True)
Get img alt:
soup.find_all('img', title=True):
I know I can do it separately but I would like to get it together to keep the flow of the HTML.
The reason I'm doing this is because only BeautifulSoup extract the hidden text by css: Display None.
When you use driver.find_element_by_tag_name('body').text you get the img alt att, but unfortunately not the hidden text by css: display:none.
I appreciate your help. Thank you!
回答1:
.find_all() returns only texts or tags, but you can make your own function that returns texts from the soup and text from the alt= attributes.
For example:
from bs4 import BeautifulSoup, Tag, NavigableString
txt = '''
Some text
<img alt="Some alt" src="#" />
Some other text
'''
def traverse(s):
for c in s.contents:
if isinstance(c, Tag):
if c.name == 'img' and 'alt' in c.attrs:
yield c['alt']
yield from traverse(c)
elif isinstance(c, NavigableString):
yield c
soup = BeautifulSoup(txt, 'html.parser')
for text in traverse(soup):
print(text.strip())
Prints:
Some text
Some alt
Some other text
来源:https://stackoverflow.com/questions/63424162/multiple-conditions-in-beautifulsoup-text-true-img-alt-true