Scrapy image download how to use custom filename

前端 未结 6 1261
眼角桃花
眼角桃花 2020-11-28 07:27

For my scrapy project I\'m currently using the ImagesPipeline. The downloaded images are stored with a SHA1 hash of their URLs as the file names.

How can I s

6条回答
  •  星月不相逢
    2020-11-28 07:50

    I did a nasty quick hack for that. In my case, I stored the title of image in my feeds. And, I had only 1 image_urls per item, so, I wrote the following script. It basically renames the image files in the /images/full/ directory with the corresponding title in the item feed that I had stored in as json.

    import os
    import json
    
    img_dir = os.path.join(os.getcwd(), 'images\\full')
    item_dir = os.path.join(os.getcwd(), 'data.json')
    
    with open(item_dir, 'r') as item_json:
        items = json.load(item_json)
    
    for item in items:
        if len(item['images']) > 0:
            cur_file = item['images'][0]['path'].split('/')[-1]
            cur_format = cur_file.split('.')[-1]
            new_title = item['title']+'.%s'%cur_format
            file_path = os.path.join(img_dir, cur_file)
            os.rename(file_path, os.path.join(img_dir, new_title))
    

    It's nasty & not recommended. But, it is a naive alternative approach.

提交回复
热议问题