问题
I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?
Code:
r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())
Output:
''
When I print request object:
print(r)
Output:
<Response [447]>
回答1:
Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.
import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)
回答2:
Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)
While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.
Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.
来源:https://stackoverflow.com/questions/53983250/request-returns-response-447