I see all questions here, but i dont understand yet.
Actualy with de code bellow i do what i need, except rename de image, so i try change name in the items.py
My code is based on Scrapy Image Pipeline: How to rename images? I tested it a week ago and it works on my own spiders.
# This pipeline is designed for an item with multiple images
class ImagesWithNamesPipeline(ImagesPipeline):
def get_media_requests(self, item, info):
# values in field "image_name" must have suffix ".jpg"
# you can only change "image_name" to your own image name filed "images"
# however it should be a list
for (image_url, image_name) in zip(item[self.IMAGES_URLS_FIELD], item["image_names"]):
yield scrapy.Request(url=image_url, meta={"image_name": image_name})
def file_path(self, request, response=None, info=None):
image_name = request.meta["image_name"]
return image_name
Here is how the ImagePipeline works:
The pipeline will execute image_downloaded -> get_images -> file_path in order. ("->" means invokes)
image_downloaded: save images that get_images return by invoking persist_fileget_images: convert images to JPEGfile_path: return the relative path of imageI scaned through the source code of ImagePipeline and found no special field for rename an image. Scrapy will rename it in this way:
def file_path(self, request, response=None, info=None):
image_guid = hashlib.sha1(to_bytes(url)).hexdigest() # change to request.url after deprecation
return 'full/%s.jpg' % (image_guid)
Therefore we should override method file_path. According to the source code of FilePipeline which ImagePipeline inherits, we only need to return relative paths and persist_file will get things done.