How to select div by text content using Beautiful Soup?

后端未结

关注

 3  720

轮回少年 2021-02-08 11:20

Trying to scrape some HTML from something like this. Sometimes the data I need is in div[0], sometimes div[1], etc.

Imagine everyone takes 3-5 classes. One of them is al

3条回答

眼角桃花 (楼主)

2021-02-08 12:12

You can extract them searching for any

element that has score as class attribute value, and use a regular expression to extract its biology score:

from bs4 import BeautifulSoup 
import sys
import re

soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')

for div in soup.find_all('div', attrs={'class': 'score'}):
    t = re.search(r'Biology\s+(\S+)', div.string)
    if t: print(t.group(1))

Run it like:

python3 script.py htmlfile

That yields:

A+
B
B
B
B

0 讨论(0)

查看其它3个回答