问题
From a betting site, I want to collect the betting rates. After inspecting the page, I noticed that these rates were included into a eventprice class. Following the explanation from here, I thus wrote this code in Python, using Beautifulsoup module:
from bs4 import BeautifulSoup
import urllib.request
import re
url = "http://sports.williamhill.com/bet/fr-fr"
try:
page = urllib.request.urlopen(url)
except:
print("An error occured.")
soup = BeautifulSoup(page, 'html.parser')
regex = re.compile('eventprice')
content_lis = soup.find_all('button', attrs={'class': regex})
print(content_lis)
However, I got the following error:
"(...) line 12, in soup = BeautifulSoup(page, 'html.parser') NameError: name 'page' is not defined"
回答1:
If you print the exception details you will see what is happening:
try:
page = urllib.request.urlopen(url)
except Exception as e:
print(f"An error occurred: {e}")
Output
An error occurred: HTTP Error 403: Forbidden
Traceback (most recent call last):
File ".../main.py", line 12, in <module>
soup = BeautifulSoup(page, 'html.parser')
NameError: name 'page' is not defined
urlopen() is raising an Exception which results in an undefined 'page' variable. In this case it's a 403 which means you may need to add authentication in order to access this URL.
Update:
A 403 response means there is no way to access this URL in the way that you are trying to access it.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
来源:https://stackoverflow.com/questions/65500904/web-scraping-using-python-and-beautiful-soup-error-page-is-not-defined