Beautiful soup find_all doesn't find CSS selector with multiple classes

徘徊边缘 提交于 2020-01-06 03:55:06

问题


On the website there is this <a> element

<a role="listitem" aria-level="1" href="https://www.rest.co.il" target="_blank" class="icon rest" title="this is main title" iconwidth="35px" aria-label="website connection" style="width: 30px; overflow: hidden;"></a>

So I use this code to catch the element
(note the find_all argument a.icon.rest)

import requests
from bs4 import BeautifulSoup

url = 'http://www.zap.co.il/models.aspx?sog=e-cellphone&pageinfo=1'
source_code = requests.get(url)
plain_text = source_code.text
soup  = BeautifulSoup(plain_text, "html.parser")
for link in soup.find_all("a.icon.rest"):
    x = link.get('href')
    print(x)

Which unfortunately returns nothing
although the beautiful soup documentation clearly says:

If you want to search for tags that match two or more CSS classes, you should use a CSS selector:

css_soup.select("p.strikeout.body")
returns: <p class="body strikeout"></p>

So why isn't this working? By the way, I'm using pycharm


回答1:


As the docs you quoted explain, if you want to search for tags that match two CSS classes, you have to use a CSS selector instead of a find_all. The example you quoted shows how to do that:

css_soup.select("p.strikeout.body")

But you didn't do that; you used find_all anyway, and of course it didn't work, because find_all doesn't take a CSS selector.

Change it to use select, which does take a CSS selector, and it will work.



来源:https://stackoverflow.com/questions/49602747/beautiful-soup-find-all-doesnt-find-css-selector-with-multiple-classes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!