How to Beautiful Soup (bs4) match just one, and only one, css class

孤者浪人 提交于 2020-08-15 10:35:03

问题


I am using following code to match all div that have CSS class "ad_item".

soup.find_all('div',class_="ad_item")

problem that I have is that, on that web page, there are also div that have CSS class set to "ad_ex_item" and "ad_ex_item".

<div class="ad_item ad_ex_item">

In documentation it is stated:

When you search for a tag that matches a certain CSS class, you’re matching against any of its CSS classes:

So how can I match div, that have only "ad_item", and do not have "ad_ex_item".

Or to put this in another way, how to search for div that have only CSS class "ad_item" ?


回答1:


I have found one solution, although it have nothing to do with BS4, it is pure python code.

for item in soup.find_all('div',class_="ad_item"):
     if len(item["class"]) != 1:
         continue;

It basically skip item, if there is more than one CSS class.




回答2:


You can pass a lambda functions to find and find_all methods.

soup.find_all(lambda x:
    x.name == 'div' and
    'ad_item' in x.get('class', []) and
    not 'ad_ex_item' in x['class']
)

The x.get('class', []) will avoid KeyError exceptions for div tags without class attribute.

If you need to exclude more than only one class you can substitute the last condition with:

    not any(c in x['class'] for c in {'ad_ex_item', 'another_class'})

And if you want to exclude exactly some classes you can use:

   not all(c in x['class'] for c in {'ad_ex_item', 'another_class'})



回答3:


You can use strict conditions like this:

soup.select("div[class='ad_item']")

That catch div with exact class. In this case with only 'ad_item' and no others joined by spaces classes.




回答4:


Did you try to use select : http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors

soup.select(".add_item")

Unfortunately, it seems that the :not selector of CSS3 is not supported. If you really need this, you may have to look at lxml. It seems to support it. see http://packages.python.org/cssselect/#supported-selectors




回答5:


You can always write a Python function that matches the tag you want, and pass that function into find_all():

def match(tag):
    return (
        tag.name == 'div'
        and 'ad_item' in tag.get('class')
        and 'ad_ex_item' not in tag.get('class'))

soup.find_all(match)



回答6:


The top answer is correct but if you want a way to keep the for loop clean or like one line solutions then use the list comprehension below.

data = [item for item in soup.find_all("div", class_="ad_item") if len(item["class"]) == 1] 



回答7:


soup.fetch('div',{'class':'add_item'})


来源:https://stackoverflow.com/questions/14496860/how-to-beautiful-soup-bs4-match-just-one-and-only-one-css-class

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!