Add a delay to a specific scrapy Request

问题

Is it possible to delay the retry of a particular scrapy Request. I have a middleware which needs to defer the request of a page until a later time. I know how to do the basic deferal (end of queue), and also how to delay all requests (global settings), but I want to just delay this one individual request. This is most important near the end of the queue, where if I do the simple deferral it immediately becomes the next request again.

回答1:

One way would be to add a middleware to your Spider (source, linked):

# File: middlewares.py

from twisted.internet import reactor
from twisted.internet.defer import Deferred


class DelayedRequestsMiddleware(object):
    def process_request(self, request, spider):
        delay_s = request.meta.get('delay_request_by', None)
        if not delay_s:
            return

        deferred = Deferred()
        reactor.callLater(delay_s, deferred.callback, None)
        return deferred

Which you could later use in your Spider like this:

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    custom_settings = {
        'DOWNLOADER_MIDDLEWARES': {'middlewares.DelayedRequestsMiddleware': 123},
    }

    def start_requests(self):
        # This request will have itself delayed by 5 seconds
        yield scrapy.Request(url='http://quotes.toscrape.com/page/1/', 
                             meta={'delay_request_by': 5})
        # This request will not be delayed
        yield scrapy.Request(url='http://quotes.toscrape.com/page/2/')

    def parse(self, response):
        ...  # Process results here

回答2:

sleep() Method suspends execution for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time.

So that you have to import time module in your spider.

import time

Then you can add the sleep method where you need the delay.

time.sleep( 5 )

回答3:

A solution that uses twisted.reactor.callLater() is here:

https://github.com/ArturGaspar/scrapy-delayed-requests

来源：https://stackoverflow.com/questions/19135875/add-a-delay-to-a-specific-scrapy-request

标签

python

scrapy