问题
How can I collect stats from within a spider callback?
Example
class MySpider(Spider):
name = "myspider"
start_urls = ["http://example.com"]
def parse(self, response):
stats.set_value('foo', 'bar')
Not sure what to import
or how to make stats
available in general.
回答1:
Check out the stats page from the scrapy documentation. The documentation states that the Stats Collector, but it may be necessary to add from scrapy.stats import stats
to your spider code to be able to do stuff with it.
EDIT: At the risk of blowing my own trumpet, if you were after a concrete example I posted an answer about how to collect failed urls.
EDIT2: After a lot of googling, apparently no imports are necessary. Just use self.crawler.stats.set_value()
!
回答2:
With scrapy 0.24 - stats I use it by the follow way:
class TopSearchesSpider(CrawlSpider):
name = "topSearches"
allowed_domains = ["...domain..."]
start_urls = (
'http://...domain...',
)
def __init__(self, stats):
super(TopSearchesSpider, self).__init__()
self.stats = stats
@classmethod
def from_crawler(cls, crawler):
return cls(crawler.stats)
def parse_start_url(self, response):
sel = Selector(response);
url = response.url;
self.stats.inc_value('pages_crawled')
...
super method is to call CrawlSpider constructor to execute its own code.
回答3:
Add this inside your spider class
def my_parse(self, response):
print self.crawler.stats.get_stats()
回答4:
if you want to use in other, you can:
spider.crawler.stats.get_stats()
来源:https://stackoverflow.com/questions/22951418/how-to-collect-stats-from-within-scrapy-spider-callback