问题
So I have a custom decorator called task that captures the status of a function. e.g.,
@task(task_name='tutorial',
alert_name='tutorial')
def start():
raw_data = download_data()
data = parse(raw_data)
push_to_db(data)
if if __name__ == "__main__":
start()
So here the task decorator monitors the status of start function and send the error message to a central monitor system using alert_name if it fails otherwise sends successful message if it succeeds.
Now I want to add this decorator to scrapy spiders to capture the status. But I do not know where this should go as the spider entry point is not unknown when using this command to start the spider:
$scrapy crawl tutorial
I have tried CrawlerRunner inside spider py file. It goes like this:
@task(task_name='tutorial',
alert_name='tutorial')
def start():
runner = CrawlerRunner()
runner.crawls(TutorialSpider)
if __name__ == "__main__":
start()
There are two problem with this:
- Even if TutorialSpider fails, task still gets a successful message. It seems like task can only capture the status of runner.crawls which isolates spider error away from the decorator.
- CrawlerRunner is not really meant for this from my perspective. It should be used for starting multiple spiders at the same time. I feel something's wrong when using it this way.
So in summary I have two questions:
- Where should I put this task decorator so that it captures the status of scrapy spiders?
- Is there a central place that I can add this decorator by default for all the spiders upon generating new spider using scrapy genspider command? I will have over 100 spiders in the future. Adding decorator for each one would be cumbersome and hard to maintain. Ideally, all I need to do is to provide task_name and alert_name as arguments when starting spiders.
Thank you so much for taking your time reading through this question and offering help.
来源:https://stackoverflow.com/questions/59458037/capture-scrapy-spider-running-status-using-an-already-defined-decorator