Request Returns Response 447

余生长醉 提交于 2020-06-17 13:10:50

问题


I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?

Code:

r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())

Output:
''

When I print request object:

print(r)

Output:
<Response [447]>

回答1:


Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.

import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)



回答2:


Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)

While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.

Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.



来源:https://stackoverflow.com/questions/53983250/request-returns-response-447

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!