scrapy-signal

Scrapy spider_idle signal not received in my extension

╄→гoц情女王★ 提交于 2020-01-14 10:43:11
问题 I have common behaviour between several spiders on spider_idle signal being received, and I would like to move this behaviour into an extension. My extension already listens for spider_opened and spider_closed signals successfully. However, the spider_idle signal is not received. Here is my extension (edited for brevity): import logging import MySQLdb import MySQLdb.cursors from scrapy import signals logger = logging.getLogger(__name__) class MyExtension(object): def __init__(self, settings,

Scrapy spider_idle signal not received in my extension

心已入冬 提交于 2020-01-14 10:42:28
问题 I have common behaviour between several spiders on spider_idle signal being received, and I would like to move this behaviour into an extension. My extension already listens for spider_opened and spider_closed signals successfully. However, the spider_idle signal is not received. Here is my extension (edited for brevity): import logging import MySQLdb import MySQLdb.cursors from scrapy import signals logger = logging.getLogger(__name__) class MyExtension(object): def __init__(self, settings,

Scrapy spider_idle signal not received in my extension

北城以北 提交于 2020-01-14 10:42:08
问题 I have common behaviour between several spiders on spider_idle signal being received, and I would like to move this behaviour into an extension. My extension already listens for spider_opened and spider_closed signals successfully. However, the spider_idle signal is not received. Here is my extension (edited for brevity): import logging import MySQLdb import MySQLdb.cursors from scrapy import signals logger = logging.getLogger(__name__) class MyExtension(object): def __init__(self, settings,

Scrapy spider_idle signal - need to add requests with parse item callback

风流意气都作罢 提交于 2019-12-02 22:32:02
问题 In my Scrapy spider I have overridden the start_requests() method, in order to retrieve some additional urls from a database, that represent items potentially missed in the crawl (orphaned items). This should happen at the end of the crawling process. Something like (pseudo code): def start_requests(self): for url in self.start_urls: yield Request(url, dont_filter=True) # attempt to crawl orphaned items db = MySQLdb.connect(host=self.settings['AWS_RDS_HOST'], port=self.settings['AWS_RDS_PORT'

Scrapy spider_idle signal - need to add requests with parse item callback

ぃ、小莉子 提交于 2019-12-02 09:57:04
In my Scrapy spider I have overridden the start_requests() method, in order to retrieve some additional urls from a database, that represent items potentially missed in the crawl (orphaned items). This should happen at the end of the crawling process. Something like (pseudo code): def start_requests(self): for url in self.start_urls: yield Request(url, dont_filter=True) # attempt to crawl orphaned items db = MySQLdb.connect(host=self.settings['AWS_RDS_HOST'], port=self.settings['AWS_RDS_PORT'], user=self.settings['AWS_RDS_USER'], passwd=self.settings['AWS_RDS_PASSWD'], db=self.settings['AWS