Recording the total time taken for running a spider in scrapy

前端 未结 3 1977
Happy的楠姐
Happy的楠姐 2021-01-12 17:18

I am using scrapy to scrap a site

I had written a spider and fetched all the items from the page and saved to a csv file, and now i want to save the total exec

相关标签:
3条回答
  • 2021-01-12 17:49

    I'm quite a beginner but I did it in a bit simpler method and I hope it makes sense.

    import datetime
    

    then declare two global variables i.e self.starting_time and self.ending_time.

    Inside the constructor of the spider class, set the starting time as

    def __init__(self, name=None, **kwargs):
            self.start_time = datetime.datetime.now()
    

    After that, use the closed method to find the difference between the ending and the starting. i.e

    def closed(self, response):
       self.ending_time = datetime.datetime.now()
       duration = self.ending_time - self.starting_time
       print(duration)
    

    That's pretty much of it. The closed method is called soon after the spider has ended the process. See the documentation here.

    0 讨论(0)
  • 2021-01-12 17:53

    The easiest way I've found so far:

    import scrapy
    
    class StackoverflowSpider(scrapy.Spider):
        name = "stackoverflow"
    
        start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping']
    
        def parse(self, response):
            for title in response.css(".summary .question-hyperlink::text").getall():
                yield {"Title":title}
    
        def close(self, reason):
            start_time = self.crawler.stats.get_value('start_time')
            finish_time = self.crawler.stats.get_value('finish_time')
            print("Total run time: ", finish_time-start_time)
    
    0 讨论(0)
  • 2021-01-12 18:08

    This could be useful:

    from scrapy.xlib.pydispatch import dispatcher
    from scrapy import signals
    from scrapy.stats import stats
    from datetime import datetime
    
    def handle_spider_closed(spider, reason):
        print 'Spider closed:', spider.name, stats.get_stats(spider)
        print 'Work time:', datetime.now() - stats.get_stats(spider)['start_time']
    
    
    dispatcher.connect(handle_spider_closed, signals.spider_closed)
    
    0 讨论(0)
提交回复
热议问题