问题
I am using following code to match all div that have CSS class "ad_item".
soup.find_all('div',class_="ad_item")
problem that I have is that, on that web page, there are also div that have CSS class set to "ad_ex_item" and "ad_ex_item".
<div class="ad_item ad_ex_item">
In documentation it is stated:
When you search for a tag that matches a certain CSS class, you’re matching against any of its CSS classes:
So how can I match div, that have only "ad_item", and do not have "ad_ex_item".
Or to put this in another way, how to search for div that have only CSS class "ad_item" ?
回答1:
I have found one solution, although it have nothing to do with BS4, it is pure python code.
for item in soup.find_all('div',class_="ad_item"):
if len(item["class"]) != 1:
continue;
It basically skip item, if there is more than one CSS class.
回答2:
You can pass a lambda functions to find and find_all methods.
soup.find_all(lambda x:
x.name == 'div' and
'ad_item' in x.get('class', []) and
not 'ad_ex_item' in x['class']
)
The x.get('class', []) will avoid KeyError exceptions for div tags without class attribute.
If you need to exclude more than only one class you can substitute the last condition with:
not any(c in x['class'] for c in {'ad_ex_item', 'another_class'})
And if you want to exclude exactly some classes you can use:
not all(c in x['class'] for c in {'ad_ex_item', 'another_class'})
回答3:
You can use strict conditions like this:
soup.select("div[class='ad_item']")
That catch div with exact class.
In this case with only 'ad_item' and no others joined by spaces classes.
回答4:
Did you try to use select : http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors
soup.select(".add_item")
Unfortunately, it seems that the :not selector of CSS3 is not supported. If you really need this, you may have to look at lxml. It seems to support it. see http://packages.python.org/cssselect/#supported-selectors
回答5:
You can always write a Python function that matches the tag you want, and pass that function into find_all():
def match(tag):
return (
tag.name == 'div'
and 'ad_item' in tag.get('class')
and 'ad_ex_item' not in tag.get('class'))
soup.find_all(match)
回答6:
The top answer is correct but if you want a way to keep the for loop clean or like one line solutions then use the list comprehension below.
data = [item for item in soup.find_all("div", class_="ad_item") if len(item["class"]) == 1]
回答7:
soup.fetch('div',{'class':'add_item'})
来源:https://stackoverflow.com/questions/14496860/how-to-beautiful-soup-bs4-match-just-one-and-only-one-css-class