Run a Scrapy spider in a Celery Task

后端未结
关注
 4  1916
刺人心 2020-11-28 04:13
This is not working anymore, scrapy\'s API has changed.
Now the documentation feature a way to \"Run Scrapy from a script\" but I get the ReactorNotRestartable

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   臣服心动
                                             
                
                
                (楼主)
            
              
              
                2020-11-28 05:02
              

            
            
                        
To avoid ReactorNotRestartable error when running Scrapy in Celery Tasks Queue I've used threads. The same approach used to run Twisted reactor several times in one app. Scrapy also used Twisted, so we can do the same way.

Here is the code:

from threading import Thread
from scrapy.crawler import CrawlerProcess
import scrapy

class MySpider(scrapy.Spider):
    name = 'my_spider'


class MyCrawler:

    spider_settings = {}

    def run_crawler(self):

        process = CrawlerProcess(self.spider_settings)
        process.crawl(MySpider)
        Thread(target=process.start).start()


Don't forget to increase CELERYD_CONCURRENCY for celery.

CELERYD_CONCURRENCY = 10


works fine for me.

This is not blocking process running, but anyway scrapy best practice is to process data in callbacks. Just do this way:

for crawler in process.crawlers:
    crawler.spider.save_result_callback = some_callback
    crawler.spider.save_result_callback_params = some_callback_params

Thread(target=process.start).start()

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复