Beautiful Soup - urllib.error.HTTPError: HTTP Error 403: Forbidden

情到浓时终转凉″ 提交于 2021-01-27 21:39:22

问题


I am trying to download a GIF file with urrlib, but it is throwing this error:

urllib.error.HTTPError: HTTP Error 403: Forbidden

This does not happen when I download from other blog sites. This is my code:

import requests
import urllib.request

url_1 = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

source_code = requests.get(url_1,headers = {'User-Agent': 'Mozilla/5.0'})    

path = 'C:/Users/roysu/Desktop/src_code/Python_projects/python/web_scrap/myPath/'

full_name = path + ".gif"    
urllib.request.urlretrieve(url_1,full_name)

回答1:


Don't use urllib.request.urlretrieve. Instead, use the requests library like this:

import requests

url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

path = "D:\\Test.gif"

response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})

file = open(path, "wb")

file.write(response.content)

file.close()

Output:

Hope that this helps!




回答2:


Solution:
The remote server is apparently checking the user agent header and rejecting requests from Python's urllib.
urllib.request.urlretrieve() doesn't allow you to change the HTTP headers, however, you can use
urllib.request.URLopener.retrieve():

import urllib.request

url_1='https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'

path='/home/piyushsambhi/Downloads/'

full_name= path + "testimg.gif"

opener = urllib.request.URLopener()
opener.addheader('User-Agent', 'Mozilla/5.0')
filename, headers = opener.retrieve(url_1, full_name)

print(filename)

NOTE: You are using Python 3 and these functions are now considered part of the "Legacy interface", and URLopener has been deprecated. For that reason you should not use them in new code.

Your code imports requests, but you don't use it - you should though because it is much easier than urllib. Below mentioned code snippet works for me:

import requests

url = 'https://goodlogo.com/images/logos/small/nike_classic_logo_2355.gif'
path='/home/piyushsambhi/Downloads/'
full_name= path + "testimg1.gif"

r = requests.get(url)
with open(full_name, 'wb') as outfile:
    outfile.write(r.content)

NOTE: CHANGE THE PATH VARIABLE ACCORDING TO YOUR MACHINE AND ENVIRONMENT



来源:https://stackoverflow.com/questions/64274098/beautiful-soup-urllib-error-httperror-http-error-403-forbidden

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!