How to select div by text content using Beautiful Soup?

后端 未结 3 720
轮回少年
轮回少年 2021-02-08 11:20

Trying to scrape some HTML from something like this. Sometimes the data I need is in div[0], sometimes div[1], etc.

Imagine everyone takes 3-5 classes. One of them is al

3条回答
  •  眼角桃花
    2021-02-08 12:12

    You can extract them searching for any

    element that has score as class attribute value, and use a regular expression to extract its biology score:

    from bs4 import BeautifulSoup 
    import sys
    import re
    
    soup = BeautifulSoup(open(sys.argv[1], 'r'), 'html')
    
    for div in soup.find_all('div', attrs={'class': 'score'}):
        t = re.search(r'Biology\s+(\S+)', div.string)
        if t: print(t.group(1))
    

    Run it like:

    python3 script.py htmlfile
    

    That yields:

    A+
    B
    B
    B
    B
    

提交回复
热议问题