问题
The below code provide information from all the numeric tags in the page. Can I use a filter to extract once for each region
For example : https://opensignal.com/reports/2019/04/uk/mobile-network-experience , I am interested in numbers only under the regional analysis tab and for all regions.
import requests
from bs4 import BeautifulSoup
html=requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
soup=BeautifulSoup(html,'html.parser')
items=soup.find_all('div',class_='c-ru-graph__rect')
for item in items:
provider=item.find('span', class_='c-ru-graph__label').text
prodvalue=item.find_next_sibling('span').find('span', class_='c-ru-graph__number').text
print(provider + " : " + prodvalue)
I want a table or df as below Easter Region
o2 Vodaphone 3 EE
4G Availability 82 76.9 73.0 89.2
Upload Speed Experience 5.6 5.9 6.8 9.5
Any pointers that can help in getting the result ?
回答1:
Here is how I would do it for all regions. Requires bs4 4.7.1. AFAICS you have to assume consistent ordering of companies.
import requests
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience")
soup = BeautifulSoup(r.content,'lxml') #'html.parser' if lxml not installed
metrics = ['4g-availability', 'video-experience', 'download-speed' , 'upload-speed', 'latency']
headers = ['02', 'Vodaphone', '3', 'EE']
results = []
for region in soup.select('.s-regional-analysis__region'):
for metric in metrics:
providers = [item.text for item in region.select('.c-ru-chart:has([data-metric="' + metric + '"]) .c-ru-graph__number')]
row = {headers[i] : providers[i] for i in range(len(providers))}
row['data-metric'] = metric
row['region'] = region['id']
results.append(row)
df = pd.DataFrame(results, columns = ['region', 'data-metric', '02','Vodaphone', '3', 'EE'] )
print(df)
Sample output:
回答2:
Assuming fixed the order of companies (it is, indeed), you can simply reduce the content to examine to only those div's containing the information you need.
import requests
from bs4 import BeautifulSoup
html = requests.get("https://opensignal.com/reports/2019/04/uk/mobile-network-experience").text
soup = BeautifulSoup(html,'html.parser')
res = soup.find_all('div', {'id':'eastern'})
aval = res[0].find_all('div', {'data-chart-name':'4g-availability'})
avalname = aval[0].find('span', {'class':'js-metric-name'}).text
upload = res[0].find_all('div', {'data-chart-name':'upload-speed'})
uploadname = upload[0].find('span', {'class':'js-metric-name'}).text
companies = [i.text for i in aval[0].find_all('span', class_='c-ru-graph__label')]
row1 = [i.text for i in aval[0].find_all('span', class_='c-ru-graph__number')]
row2 = [i.text for i in upload[0].find_all('span', class_='c-ru-graph__number')]
import pandas as pd
df = pd.DataFrame({avalname:row1,
uploadname:row2})
df.index = companies
df = df.T
output
O2 Vodafone 3 EE
4G Availability 82.0 76.9 73.0 89.2
Upload Speed Experience 5.6 5.9 6.8 9.5
来源:https://stackoverflow.com/questions/56081493/webscrape-interactive-chart-in-python-using-beautiful-soup-with-loops