502 error using Requests to search website in Python

问题

Using a very basic program to search up a query on a website and print out the search results, why do I get a 502 error?

import requests
from bs4 import BeautifulSoup
import re

def main():
    url = "https://www.last10k.com/Search"
    dat = {'q':'goog'}
    resp = requests.get(url, params=dat)
    print(resp.content)

回答1:

Define a User-Agent header. Like this:

import requests

def main():
    url = "https://www.last10k.com/Search"
    dat = {'q':'goog'}
    resp = requests.get(url, params=dat, headers={'User-Agent': 'Mozilla/5.0'})
    print(resp.status_code)

Why of this requirement? Wikimedia User-Agent policy

回答2:

I had this problem and found that a mix of looking at the content and trying the request with a browser helped me find the solution. Maybe it will help you, too, so here is what I did:

My request was successful with a browser, then failed with python. The URLs were the same. So I used the debugger. You can also simply print stuff, but the debugger shows what all is to see and lets you explore what you otherwise have missed. And I found that the response content on the failed python-request was an error-message which googles to be a ruby-problem.

So there was some different behavior on the remote-side, but what causes it? Adding a User-Agent-header, as suggested, was nice, but did not change anything. So I looked at the other headers and found that the Basic Authentication string looked completely different.

My solution: I fed the python request with the wrong auth-data due to some reafactoring I made and the remote side was processing "permission denied" results somehow badly, which ended up in a 502 instead of a 403.

来源：https://stackoverflow.com/questions/43239698/502-error-using-requests-to-search-website-in-python

标签

python

python-requests