Downloading pictures with scrapy

前端 未结 2 2033
爱一瞬间的悲伤
爱一瞬间的悲伤 2020-12-16 05:17

I\'m starting with scrapy, and I have first real problem. It\'s downloading pictures. So this is my spider.

from scrapy.contrib.spiders import CrawlSpider, R         


        
相关标签:
2条回答
  • 2020-12-16 05:22

    I think the image URL you scraped is relative. To construct the absolute URL use urlparse.urljoin:

    def parse(self, response):
        ...
        image_relative_url = hxs.select("...").extract()[0]
        import urlparse
        image_absolute_url = urlparse.urljoin(response.url, image_relative_url.strip())
        item['image_urls'] = [image_absolute_url]
        ...
    

    Haven't used ITEM_PIPELINES, but the docs say:

    In a Spider, you scrape an item and put the URLs of its images into a image_urls field.

    So, item['image_urls'] should be a list of image URLs. But your code has:

    item['image_urls'] = 'http://www.domain.com' + item['image_urls']
    

    So, i guess it iterates your single URL char by char - using each as URL.

    0 讨论(0)
  • 2020-12-16 05:39

    I think that you may need to provide your image url in a list to the Item:

    item['image_urls'] = [ 'http://www.domain.com' + item['image_urls'] ]
    
    0 讨论(0)
提交回复
热议问题