How to find the li tags with a specific class name but not others? For example:
...
no wanted
not his
You can use CSS selectors to match the exact class name.
html = '''- no wanted
- not his one
- neither this one
- neither this one
- neither this one
- I WANT THIS ONLY ONE
'''
soup = BeautifulSoup(html, 'lxml')
tags = soup.select('li[class="z"]')
print(tags)
The same result can be achieved using lambda:
tags = soup.find_all(lambda tag: tag.name == 'li' and tag.get('class') == ['z'])
Output:
[- I WANT THIS ONLY ONE
]
Have a look at Multi-valued attributes. You'll understand why class_='z' matches all the tags that have z in their class name.
HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is
class(that is, a tag can have more than one CSS class). Others includerel,rev,accept-charset,headers, andaccesskey. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:css_soup = BeautifulSoup('') css_soup.p['class'] # ["body"] css_soup = BeautifulSoup('') css_soup.p['class'] # ["body", "strikeout"]