问题
I'm trying to retrieve the following URL: http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004.
import urllib2
response = urllib2.urlopen('http://www.winkworth.co.uk/rent/property/terraced-house-to-rent-in-mill-road--/WOT140129')
response.read()
However I'm getting an empty string. When I try it through the browser or with cURL it works fine. Any ideas what's going on?
回答1:
I got a response when using the requests
library but not when using urllib2
, so I experimented with HTTP request headers.
As it turns out, the server expects an Accept
header; urllib2
doesn't send one, requests
and cURL send */*
.
Send one with urllib2
as well:
url = 'http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004'
req = urllib2.Request(url, headers={'accept': '*/*'})
response = urllib2.urlopen(req)
Demo:
>>> import urllib2
>>> url = 'http://www.winkworth.co.uk/sale/property/flat-for-sale-in-masefield-court-london-n5/HIH140004'
>>> len(urllib2.urlopen(url).read())
0
>>> request = urllib2.Request(url, headers={'accept': '*/*'})
>>> len(urllib2.urlopen(request).read())
37197
The server is at fault here; RFC 2616 states:
If no Accept header field is present, then it is assumed that the client accepts all media types.
来源:https://stackoverflow.com/questions/28118611/python-urllib2-returning-an-empty-string