Send email alert using Scrapy after multiple spiders have finished crawling

谁都会走 提交于 2020-05-31 03:36:01

问题


Just wondering what is the best way to implement this. I have 2 spiders and I want to send an email alert depending on what is scraped after the 2 spiders have finished crawling.

I'm using a script based on the tutorial to run both spiders like so:

if __name__ == "__main__":
    process = CrawlerProcess(get_project_settings())
    process.crawl(NqbpSpider)
    process.crawl(GladstoneSpider)
    process.start() # the script will block here until the crawling is finished

Is it best to call an email function after process.start() or to code up an email function in the pipelines.py file under the close_spider function

def close_spider(self, spider):


回答1:


You can use this

from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher

class MySpider(CrawlSpider):
    def __init__(self):
        dispatcher.connect(self.spider_closed, signals.spider_closed)

    def spider_closed(self, spider):
      # second param is the instance of the spider about to be closed.
      #  Write the mail sending part here

If you want to include the scraped data with the mail, write the script in the pipelines.py file.

class MyPipeline(Pipeline):
    spider = None

    def process_item(self, item, spider):
        if spider.name == 'Name of the spider':
            # Use the data and send the mail from here
        return item


来源:https://stackoverflow.com/questions/60499455/send-email-alert-using-scrapy-after-multiple-spiders-have-finished-crawling

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!