I am using scrapy to scrap a site
I had written a spider and fetched all the items from the page and saved to a csv file,
and now i want to save the total exec
I'm quite a beginner but I did it in a bit simpler method and I hope it makes sense.
import datetime
then declare two global variables i.e self.starting_time and self.ending_time.
Inside the constructor of the spider class, set the starting time as
def __init__(self, name=None, **kwargs):
self.start_time = datetime.datetime.now()
After that, use the closed method to find the difference between the ending and the starting. i.e
def closed(self, response):
self.ending_time = datetime.datetime.now()
duration = self.ending_time - self.starting_time
print(duration)
That's pretty much of it. The closed method is called soon after the spider has ended the process. See the documentation here.
The easiest way I've found so far:
import scrapy
class StackoverflowSpider(scrapy.Spider):
name = "stackoverflow"
start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping']
def parse(self, response):
for title in response.css(".summary .question-hyperlink::text").getall():
yield {"Title":title}
def close(self, reason):
start_time = self.crawler.stats.get_value('start_time')
finish_time = self.crawler.stats.get_value('finish_time')
print("Total run time: ", finish_time-start_time)
This could be useful:
from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.stats import stats
from datetime import datetime
def handle_spider_closed(spider, reason):
print 'Spider closed:', spider.name, stats.get_stats(spider)
print 'Work time:', datetime.now() - stats.get_stats(spider)['start_time']
dispatcher.connect(handle_spider_closed, signals.spider_closed)