问题
I am trying to download a GIF file with urrlib
, but it is throwing this error:
urllib.error.HTTPError: HTTP Error 403: Forbidden
This does not happen when I download from other blog sites. This is my code:
import requests
import urllib.request
url_1 = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
source_code = requests.get(url_1,headers = {'User-Agent': 'Mozilla/5.0'})
path = 'C:/Users/roysu/Desktop/src_code/Python_projects/python/web_scrap/myPath/'
full_name = path + ".gif"
urllib.request.urlretrieve(url_1,full_name)
回答1:
Don't use urllib.request.urlretrieve
. Instead, use the requests
library like this:
import requests
url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path = "D:\\Test.gif"
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
file = open(path, "wb")
file.write(response.content)
file.close()
Output:
Hope that this helps!
回答2:
Solution:
The remote server is apparently checking the user agent header and rejecting requests from Python's urllib.urllib.request.urlretrieve()
doesn't allow you to change the HTTP headers, however, you can use urllib.request.URLopener.retrieve()
:
import urllib.request
url_1='https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path='/home/piyushsambhi/Downloads/'
full_name= path + "testimg.gif"
opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'Mozilla/5.0')
filename, headers = opener.retrieve(url_1, full_name)
print(filename)
NOTE: You are using Python 3 and these functions are now considered part of the "Legacy interface", and URLopener
has been deprecated. For that reason you should not use them in new code.
Your code imports requests
, but you don't use it - you should though because it is much easier than urllib
. Below mentioned code snippet works for me:
import requests
url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path='/home/piyushsambhi/Downloads/'
full_name= path + "testimg1.gif"
r = requests.get(url)
with open(full_name, 'wb') as outfile:
outfile.write(r.content)
NOTE: CHANGE THE PATH VARIABLE ACCORDING TO YOUR MACHINE AND ENVIRONMENT
来源:https://stackoverflow.com/questions/64274098/beautiful-soup-urllib-error-httperror-http-error-403-forbidden