Scrapy request+response+download time

六眼飞鱼酱① 提交于 2020-01-01 08:45:27

问题


UPD: Not close question because I think my way is not so clear as should be

Is it possible to get current request + response + download time for saving it to Item?

In "plain" python I do

start_time = time()
urllib2.urlopen('http://example.com').read()
time() - start_time

But how i can do this with Scrapy?

UPD:

Solution enought for me but I'm not sure of quality of results. If you have many connections with timeout errors Download time may be wrong (even DOWNLOAD_TIMEOUT * 3)

For

settings.py

DOWNLOADER_MIDDLEWARES = {
    'myscraper.middlewares.DownloadTimer': 0,
}

middlewares.py

from time import time
from scrapy.http import Response


class DownloadTimer(object):
    def process_request(self, request, spider):
        request.meta['__start_time'] = time()
        # this not block middlewares which are has greater number then this
        return None

    def process_response(self, request, response, spider):
        request.meta['__end_time'] = time()
        return response  # return response coz we should

    def process_exception(self, request, exception, spider):
        request.meta['__end_time'] = time()
        return Response(
            url=request.url,
            status=110,
            request=request)

inside spider.py in def parse(...

log.msg('Download time: %.2f - %.2f = %.2f' % (
    response.meta['__end_time'], response.meta['__start_time'],
    response.meta['__end_time'] - response.meta['__start_time']
), level=log.DEBUG)

回答1:


You could write a Downloader Middleware which would time each request. It would add a start time to the request before it's made and then a finish time when it's finished. Typically, arbitrary data such as this is stored in the Request.meta attribute. This timing information could later be read by your spider and added to your item.

This downloader middleware sounds like it could be useful on many projects.



来源:https://stackoverflow.com/questions/15831955/scrapy-requestresponsedownload-time

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!