Scrapy:In a request fails (eg 404,500), how to ask for another alternative request?

大城市里の小女人 提交于 2019-12-21 04:33:07

问题


I have a problem with scrapy. In a request fails (eg 404,500), how to ask for another alternative request? Such as two links can obtain price info, the one failed, request another automatically.


回答1:


Use "errback" in the Request like errback=self.error_handler where error_handler is a function (just like callback function) in this function check the error code and make the alternative Request.

see errback in the scrapy documentation: http://doc.scrapy.org/en/latest/topics/request-response.html




回答2:


Just set handle_httpstatus_list = [404, 500] and check for the status code in parse method. Here's an example:

from scrapy.http import Request
from scrapy.spider import BaseSpider


class MySpider(BaseSpider):
    handle_httpstatus_list = [404, 500]
    name = "my_crawler"

    start_urls = ["http://github.com/illegal_username"]

    def parse(self, response):
        if response.status in self.handle_httpstatus_list:
            return Request(url="https://github.com/kennethreitz/", callback=self.after_404)

    def after_404(self, response):
        print response.url

        # parse the page and extract items

Also see:

  • How to get the scrapy failure URLs?
  • Scrapy and response status code: how to check against it?
  • How to retry for 404 link not found in scrapy?

Hope that helps.



来源:https://stackoverflow.com/questions/16909106/scrapyin-a-request-fails-eg-404-500-how-to-ask-for-another-alternative-reque

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!