问题
I searched thoroughly for solution on many websites and on here but none of them works!
I am trying to scrape flashscores.com and i want to parse a <td> with the class name cell_ab team-home or cell_ab team-home bold
I tried using re
soup.find_all('td', { 'class'= re.compile(r"^(cell_ab team-home |cell_ab team-home bold )$"))
and
soup.find_all('td', { 'class' : ['cell_ab team-home ','cell_ab team-home bold '])
neither of them works.
someone requested for the codes so here it is
from tkinter import *
from selenium import webdriver
import threading
from bs4 import BeautifulSoup
browser = webdriver.Firefox()
browser.get('http://www.flashscore.com/')
HTML = browser.page_source
soap = BeautifulSoup(HTML)
for item in soap.find_all('td', class_ = ['cell_ab team-home ','cell_ab team-home bold ']):
Listbox.insert(END,item.text)
回答1:
The bs4 documentation says the following about matching using class_:
Remember that a single tag can have multiple values for its
classattribute. When you search for a tag that matches a certain CSS class, you’re matching against any of its CSS classes.
According to the documentation, you'd have to use CSS selectors here, with the .select method. Thus something like this ought to do the trick:
soup.select('td.cell_ab.team-home')
This would select all <td>s that have both cell_ab and team-home classes set, including <td>s that have additional classes, such as bold.
回答2:
You can use re to find it:
soap.findAll('td', {'class' : re.compile('cell_ab team-home '|'cell_ab team-home bold ')})
This will find tag td with class='cell_ab team-home' and tag td with clas='cell_ab team-home bold'
回答3:
you can use a list sitax, like:
soup.findAll('td', {'class':['cell_ab team-home', 'cell_ab team-home bold ']})
回答4:
you can use selector like this:
soup.select('.cell_ab.team-home')
来源:https://stackoverflow.com/questions/30147223/beautiful-soup-findall-multiple-class-using-one-query