Saving Image from URL using Python Requests - URL type error

折月煮酒 提交于 2021-02-11 06:55:02

问题


Using the following code:

    with open('newim','wb') as f:
        f.write(requests.get(repr(url)))

where the url is:

    url = ''

I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python33\lib\site-packages\requests\api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Python33\lib\site-packages\requests\api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "C:\Python33\lib\site-packages\requests\sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Python33\lib\site-packages\requests\sessions.py", line 567, in send
    adapter = self.get_adapter(url=request.url)
  File "C:\Python33\lib\site-packages\requests\sessions.py", line 641, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)

I have seen other posts with what, at first glance, appears to be a similar problem but I haven't had any luck just adding 'https://' or anything like that...I seriously want to avoid having to do this in webdriver+Autoit or something because I have to do a similar exercise for thousands of images.


回答1:


This is an image encoded in base64. Quoting the URL below: "base64 equals to text (string) representation of the image itself".

Read this for a detailed explanation: http://www.stoimen.com/blog/2009/04/23/when-you-should-use-base64-for-images/

In order to use them you'll have to implement a base64 decoder. Luckily SO already provides you with the answer on how to do it:

Python base64 data decode




回答2:


There seems to be a problem with your understanding of the concept of embedded images. The url you have posted is, actually, what your browser returns when you select 'View Image' or 'Copy Image Location' (or something similar, depending on the browser) from the context menu, and formally is called a data URI.

It is not an http url pointing to an image, and you can not use it to retrieve actual images from any server: this is exactly what requests points out in the error message.


So, how do we get these images? The following script will handle this task:

import requests
from lxml import html
import binascii as ba

i = 0
url="<Page URL goes here>" #Ex: http://server/dir/images.html
page = requests.get(url)
struct = html.fromstring(page.text)
images = struct.xpath('//img/@src')

for img in images:
    i += 1
    ext = img.partition('data:image/')[2].split(';')[0]
    with open('newim'+str(i)+'.'+ext,'wb') as f:
        f.write(ba.a2b_base64(img.partition('base64,')[2]))

print("Done")

To run it you will need to install, along with requests, the lxml library which can be found here.


Here follows a short description of how the script functions:

First it requests the url from the server and, after it gets the server's response, it stores it in a Response object (page).

Then it utilizes html.fromstring() from lxml to transform the "textified" content of page into a tree-structure which can be processed by commands utilizing XPath syntax, like this one: images = struct.xpath('//img/@src').

The result is a list containing the contents of the src attribute of every image in the page. In this case (embedded images) these are the data URIs.

Then, for every image in the list, it first gets the image type (which will be used as the newim's extension), using partition() and split() and stores it in ext. Then it converts the base64 encoded data to binary (using a2b_base64() from binascii module) and writes the output to the file.


As a small demo, save this html code (as, eg, images.html) somewhere in your server

<h1>Images</h1>
<img src="" />  
<br />
<img src=""></img>
<br />
<img src=""/>

and point to it in the script: requests.get("http://yourserver/somedir/images.html").

When you run the script you will get the following 3 images: , , , respectively named newim1.png, newim2.png and newim3.jpg.


As a reminder, do note that this script (in its current form) will only handle embedded images. If you want to process also ordinary linked images, then you have to modify it accordingly (but this is not difficult).



来源:https://stackoverflow.com/questions/33048636/saving-image-from-url-using-python-requests-url-type-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!