问题
Using a very basic program to search up a query on a website and print out the search results, why do I get a 502 error?
import requests
from bs4 import BeautifulSoup
import re
def main():
url = "https://www.last10k.com/Search"
dat = {'q':'goog'}
resp = requests.get(url, params=dat)
print(resp.content)
回答1:
Define a User-Agent
header. Like this:
import requests
def main():
url = "https://www.last10k.com/Search"
dat = {'q':'goog'}
resp = requests.get(url, params=dat, headers={'User-Agent': 'Mozilla/5.0'})
print(resp.status_code)
Why of this requirement? Wikimedia User-Agent policy
回答2:
I had this problem and found that a mix of looking at the content and trying the request with a browser helped me find the solution. Maybe it will help you, too, so here is what I did:
My request was successful with a browser, then failed with python. The URLs were the same. So I used the debugger. You can also simply print stuff, but the debugger shows what all is to see and lets you explore what you otherwise have missed. And I found that the response content on the failed python-request was an error-message which googles to be a ruby-problem.
So there was some different behavior on the remote-side, but what causes it? Adding a User-Agent-header, as suggested, was nice, but did not change anything. So I looked at the other headers and found that the Basic Authentication string looked completely different.
My solution: I fed the python request with the wrong auth-data due to some reafactoring I made and the remote side was processing "permission denied" results somehow badly, which ended up in a 502 instead of a 403.
来源:https://stackoverflow.com/questions/43239698/502-error-using-requests-to-search-website-in-python