How to get stats from a scrapy run?

痴心易碎 提交于 2020-01-13 05:50:14

问题


I am running the scrapy spider from external file as per the example in scrapy docs. I want to grab the stats provided by the Core API and store it to mysql table after the crawl is finished.

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy import log, signals
from test.spiders.myspider import *
from scrapy.utils.project import get_project_settings
from test.pipelines import MySQLStorePipeline
import datetime

spider = MySpider()


def run_spider(spider):        
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
    crawler.configure()
    crawler.crawl(spider)
    crawler.start()
    log.start()
    reactor.run()
    mysql_insert = MySQLStorePipeline()
        mysql_insert.cursor.execute(
            'insert into crawler_stats(sites_id, start_time,end_time,page_scraped,finish_reason) 
              values(%s,%s,%s, %s,%s)',
                  (1,datetime.datetime.now(),datetime.datetime.now(),100,'test'))

    mysql_insert.conn.commit()

run_spider(spider)

How can I get the values of stats like start_time, end_time, pages_scraped, finish_reason in the above code?


回答1:


Get them from the crawler.stats collector:

stats = crawler.stats.get_stats()

Example code (collecting stats in the spider_closed signal handler):

def callback(spider, reason):
    stats = spider.crawler.stats.get_stats()  # stats is a dictionary

    # write stats to the database here

    reactor.stop()


def run_spider(spider):        
    settings = get_project_settings()
    crawler = Crawler(settings)
    crawler.signals.connect(callback, signal=signals.spider_closed)
    crawler.configure()
    crawler.crawl(spider)
    crawler.start()
    log.start()
    reactor.run()


run_spider(spider)


来源:https://stackoverflow.com/questions/27739380/how-to-get-stats-from-a-scrapy-run

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!