问题
I need to fetch milestone information from Github by scraping.
The milestone information is embedded in 2 types of div classes:
table-list-item milestone notdue and table-list-item milestone.
How can I retrieve the information contained in both classes?
I have:
milestones = soup.find_all('div', {'class': 'table-list-item milestone'})
but this line returns empty list for table-list-item milestone notdue
Right now I am doing the following (ugly hack):
milestones = soup.find_all('div', {'class':'table-list-item milestone'})
milestones.extend(soup.findAll('div', {'class': 'table-list-item milestone notdue'}))
Is there any elegant solution for this?
As per this question, BeautifulSoup is supposed to return all matching ones. My issue is exactly opposite!
回答1:
soup.find_all('div', {'class': 'milestone'})
or use CSS selector:
soup.select('.milestone')
in bs4, class is Multi-valued attributes:
it's store in list:[table-list-item, milestone, notdue] and [table-list-item, milestone]
what you need to do is find the shared value,like milestone
来源:https://stackoverflow.com/questions/42708837/beautifulsoup-partial-div-class-matching