How to collect stats from within scrapy spider callback?

倖福魔咒の 提交于 2019-12-07 00:23:07

问题


How can I collect stats from within a spider callback?

Example

class MySpider(Spider):
     name = "myspider"
     start_urls = ["http://example.com"]

def parse(self, response):
    stats.set_value('foo', 'bar')

Not sure what to import or how to make stats available in general.


回答1:


Check out the stats page from the scrapy documentation. The documentation states that the Stats Collector, but it may be necessary to add from scrapy.stats import stats to your spider code to be able to do stuff with it.

EDIT: At the risk of blowing my own trumpet, if you were after a concrete example I posted an answer about how to collect failed urls.

EDIT2: After a lot of googling, apparently no imports are necessary. Just use self.crawler.stats.set_value()!




回答2:


With scrapy 0.24 - stats I use it by the follow way:

class TopSearchesSpider(CrawlSpider):
    name = "topSearches"
    allowed_domains = ["...domain..."]

    start_urls = (
        'http://...domain...',
    )

    def __init__(self, stats):
        super(TopSearchesSpider, self).__init__()
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        return cls(crawler.stats)

    def parse_start_url(self, response):
        sel = Selector(response);
        url = response.url;

        self.stats.inc_value('pages_crawled')
        ...

super method is to call CrawlSpider constructor to execute its own code.




回答3:


Add this inside your spider class

def my_parse(self, response): 
    print self.crawler.stats.get_stats()



回答4:


if you want to use in other, you can:

spider.crawler.stats.get_stats()



来源:https://stackoverflow.com/questions/22951418/how-to-collect-stats-from-within-scrapy-spider-callback

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!