Python Scrapy: passing properties into parser

血红的双手。 提交于 2019-12-25 02:52:15

问题


I'm new to Scrapy and web-scraping in general so this might be a stupid question but it wouldn't be the first time so here goes.

I have a simple Scrapy spider, based on the tutorial example, that processes various URLs (in start_urls). I would like to categorise the URLs e.g. URLs A, B, and C are Category 1, while URLS D and E are Category 2, then be able to store the category on the resulting Items when the parser processes the response for each URL.

I guess I could have a separate spider for each category, then just hold the category as an attribute on the class so the parser can pick it up from there. But I was kind of hoping I could have just one spider for all the URLs, but tell the parser which category to use for a given URL.

Right now, I'm setting up the URLs in start_urls via my spider's init() method. How do I pass the category for a given URL from my init method to the parser so that I can record the category on the Items generated from the responses for that URL?


回答1:


As paul t. suggested:

class MySpider(CrawlSpider):

    def start_requests(self):
        ...
        yield Request(url1, meta={'category': 'cat1'}, callback=self.parse)
        yield Request(url2, meta={'category': 'cat2'}, callback=self.parse)
        ...

    def parse(self, response):
        category = response.meta['category']
        ...

You use start_requests to have control over the first URLs you're visiting, attaching metadata to each URL, and you can access that metadata through response.meta afterwards.

Same thing if you need to pass data from a parse function to a parse_item, for instance.



来源:https://stackoverflow.com/questions/21582989/python-scrapy-passing-properties-into-parser

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!