问题
Just wondering what is the best way to implement this. I have 2 spiders and I want to send an email alert depending on what is scraped after the 2 spiders have finished crawling.
I'm using a script based on the tutorial to run both spiders like so:
if __name__ == "__main__":
process = CrawlerProcess(get_project_settings())
process.crawl(NqbpSpider)
process.crawl(GladstoneSpider)
process.start() # the script will block here until the crawling is finished
Is it best to call an email function after process.start() or to code up an email function in the pipelines.py file under the close_spider function
def close_spider(self, spider):
回答1:
You can use this
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher
class MySpider(CrawlSpider):
def __init__(self):
dispatcher.connect(self.spider_closed, signals.spider_closed)
def spider_closed(self, spider):
# second param is the instance of the spider about to be closed.
# Write the mail sending part here
If you want to include the scraped data with the mail, write the script in the pipelines.py file.
class MyPipeline(Pipeline):
spider = None
def process_item(self, item, spider):
if spider.name == 'Name of the spider':
# Use the data and send the mail from here
return item
来源:https://stackoverflow.com/questions/60499455/send-email-alert-using-scrapy-after-multiple-spiders-have-finished-crawling