Recording the total time taken for running a spider in scrapy

前端未结

关注

 3  1986

I am using scrapy to scrap a site

I had written a spider and fetched all the items from the page and saved to a csv file, and now i want to save the total exec


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  滥情空心        
                
              
                            
                2021-01-12 17:49
              
            
            
                                                                       
I'm quite a beginner but I did it in a bit simpler method and I hope it makes sense.
import datetime

then declare two global variables i.e
self.starting_time and self.ending_time.
Inside the constructor of the spider class, set the starting time as
def __init__(self, name=None, **kwargs):
        self.start_time = datetime.datetime.now()

After that, use the closed method to find the difference between the ending and the starting.
i.e
def closed(self, response):
   self.ending_time = datetime.datetime.now()
   duration = self.ending_time - self.starting_time
   print(duration)

That's pretty much of it. The closed method is called soon after the spider has ended the process. See the documentation here.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2021-01-12 17:53
              
            
            
                                                                       
The easiest way I've found so far:
import scrapy

class StackoverflowSpider(scrapy.Spider):
    name = "stackoverflow"

    start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping']

    def parse(self, response):
        for title in response.css(".summary .question-hyperlink::text").getall():
            yield {"Title":title}

    def close(self, reason):
        start_time = self.crawler.stats.get_value('start_time')
        finish_time = self.crawler.stats.get_value('finish_time')
        print("Total run time: ", finish_time-start_time)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-01-12 18:08
              
            
            
                                                                       
This could be useful:

from scrapy.xlib.pydispatch import dispatcher
from scrapy import signals
from scrapy.stats import stats
from datetime import datetime

def handle_spider_closed(spider, reason):
    print 'Spider closed:', spider.name, stats.get_stats(spider)
    print 'Work time:', datetime.now() - stats.get_stats(spider)['start_time']


dispatcher.connect(handle_spider_closed, signals.spider_closed)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复