Find a tag using text it contains using BeautifulSoup

荒凉一梦 提交于 2021-02-20 05:16:37

问题


I am trying to webscrape some parts of this page: https://markets.businessinsider.com/stocks/bp-stock using BeautifulSoup to search for some text contained in h2 title of tables

when i do:

data_table = soup.find('h2', text=re.compile('RELATED STOCKS')).find_parent('div').find('table')

It correctly get the table I am after.

When I try to get the table "Analyst Opinion" using the similar line, it returns None:

data_table = soup.find('h2', text=re.compile('ANALYST OPINIONS')).find_parent('div').find('table')

I am guessing that there might be some special characters in the html code, that provides re to function as expected. I tried this too:

data_table = soup.find('h2', text=re.compile('.*?STOCK.*?INFORMATION.*?', re.DOTALL))

without success.

I would like to get the table that contain this bit of text "Analyst Opinion" without finding all tables but by checking if contains my requested text.

Any idea will be highly appreciated. Best


回答1:


You can use CSS selector to locate the <table>:

import requests
from bs4 import BeautifulSoup

url = 'https://markets.businessinsider.com/stocks/bp-stock '

soup = BeautifulSoup(requests.get(url).text, 'lxml')

table = soup.select_one('div:has(> h2:contains("Analyst Opinions")) table')

for tr in table.select('tr'):
    print(tr.get_text(strip=True, separator=' '))

Prints:

2/26/2018 BP Outperform RBC Capital Markets
9/22/2017 BP Outperform BMO Capital Markets

More about CSS selectors here.


EDIT: For canse-insensitive method, you can use bs4 API with regular expressions (note the flags=re.I). This is the equivalent of .select() method above:

import re
import requests
from bs4 import BeautifulSoup

url = 'https://markets.businessinsider.com/stocks/bp-stock '

soup = BeautifulSoup(requests.get(url).text, 'lxml')

h2 = soup.find(lambda t: t.name=='h2' and re.findall('analyst opinions', t.text, flags=re.I))
table = h2.find_parent('div').find('table')

for tr in table.select('tr'):
    print(tr.get_text(strip=True, separator=' '))


来源:https://stackoverflow.com/questions/57578730/find-a-tag-using-text-it-contains-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!