How to collect stats from within scrapy spider callback?

问题

How can I collect stats from within a spider callback?

Example

class MySpider(Spider):
     name = "myspider"
     start_urls = ["http://example.com"]

def parse(self, response):
    stats.set_value('foo', 'bar')

Not sure what to import or how to make stats available in general.

回答1:

Check out the stats page from the scrapy documentation. The documentation states that the Stats Collector, but it may be necessary to add from scrapy.stats import stats to your spider code to be able to do stuff with it.

EDIT: At the risk of blowing my own trumpet, if you were after a concrete example I posted an answer about how to collect failed urls.

EDIT2: After a lot of googling, apparently no imports are necessary. Just use self.crawler.stats.set_value()!

回答2:

With scrapy 0.24 - stats I use it by the follow way:

class TopSearchesSpider(CrawlSpider):
    name = "topSearches"
    allowed_domains = ["...domain..."]

    start_urls = (
        'http://...domain...',
    )

    def __init__(self, stats):
        super(TopSearchesSpider, self).__init__()
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.stats)

    def parse_start_url(self, response):
        sel = Selector(response);
        url = response.url;

        self.stats.inc_value('pages_crawled')
        ...

super method is to call CrawlSpider constructor to execute its own code.

回答3:

Add this inside your spider class

def my_parse(self, response): 
    print self.crawler.stats.get_stats()

回答4:

if you want to use in other, you can:

spider.crawler.stats.get_stats()

来源：https://stackoverflow.com/questions/22951418/how-to-collect-stats-from-within-scrapy-spider-callback

标签

python

scrapy

scrapy-spider