How do I stop all spiders and the engine immediately after a condition in a pipeline is met?

走远了吗. 提交于 2019-11-28 05:33:04

You can raise a CloseSpider exception to close down a spider. However, I don't think this will work from a pipeline.

EDIT: avaleske notes in the comments to this answer that he was able to raise a CloseSpider exception from a pipeline. Most wise would be to use this.

A similar situation has been described on the Scrapy Users group, in this thread.

I quote:

To close an spider for any part of your code you should use engine.close_spider method. See this extension for an usage example: https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/closespider.py#L61

You could write your own extension, whilst looking at closespider.py as an example, which will shut down a spider if a certain condition has been met.

Another "hack" would be to set a flag on the spider in the pipeline. For example:

pipeline:

def process_item(self, item, spider):
    if some_flag:
        spider.close_down = True

spider:

def parse(self, response):
    if self.close_down:
        raise CloseSpider(reason='API usage exceeded')
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!