Can't open Unicode URL with Python

前端 未结 5 1439
慢半拍i
慢半拍i 2020-12-09 20:02

Using Python 2.5.2 and Linux Debian, I\'m trying to get the content from a Spanish URL that contains a Spanish char \'í\':

import urllib
url = u         


        
5条回答
  •  旧时难觅i
    2020-12-09 20:55

    I'm having a similar case, right now. I'm trying to download images. I retrieve the URLs from the server in a JSON file. Some of the images contain non-ASCII characters. This throws an error:

    for image in product["images"]: 
        filename = os.path.basename(image) 
        filepath = product_path + "/" + filename 
        urllib.request.urlretrieve(image, filepath) # error!
    

    UnicodeEncodeError: 'ascii' codec can't encode character '\xc7' in position ...


    I've tried using .encode("UTF-8"), but can't say it helped:

    # coding=UTF-8
    import urllib
    url = u"http://example.com/wp-content/uploads/2018/09/İMAGE-1.png"
    url = url.encode("UTF-8")
    urllib.request.urlretrieve(url, "D:\image-1.jpg")
    

    This just throws another error:

    TypeError: cannot use a string pattern on a bytes-like object


    Then I gave urllib.parse.quote(url) a go:

    import urllib
    url = "http://example.com/wp-content/uploads/2018/09/İMAGE-1.png"
    url = urllib.parse.quote(url)
    urllib.request.urlretrieve(url, "D:\image-1.jpg")
    

    and again, this throws another error:

    ValueError: unknown url type: 'http%3A//example.com/wp-content/uploads/2018/09/%C4%B0MAGE-1.png'

    The : in "http://..." also got escaped, and I think this is the cause of the problem.

    So, I've figured out a workaround. I just quote/escape the path, not the whole URL.

    import urllib.request
    import urllib.parse
    url = "http://example.com/wp-content/uploads/2018/09/İMAGE-1.png"
    url = urllib.parse.urlparse(url)
    url = url.scheme + "://" + url.netloc + urllib.parse.quote(url.path)
    urllib.request.urlretrieve(url, "D:\image-1.jpg")
    

    This is what the URL looks like: "http://example.com/wp-content/uploads/2018/09/%C4%B0MAGE-1.png", and now I can download the image.

提交回复
热议问题