How to add attributes to a request in a scrapy contract

假装没事ソ 提交于 2020-01-14 03:12:32

问题


Scrapy contract fails if we are instantiating an Item or ItemLoader with the meta attribute or the Request() object passed from a previous parse method.

I was thinking of maybe overriding ScrapesContract to preprocess the request and load some dummy values in request.meta, not sure if that is good practice though.

I have seen the pre_process method in the docs (illustrated in the HasHeaderContract at the bottom) to get attributes from the request object, but I'm not sure if it can be used to set attributes.

EDIT: More details. Methods from an example crawler:

def parse_level_one(self, response):
   # populate loader
   return Request(url=url, callback=self.parse_level_two, meta={'loader': loader.load_item()})

def parse_level_two(self, response):
    """Parse product detail page

    @url http://example.com
    @scrapes some_field1 some_field2
    """
    loader = MyItemLoader(response.meta['loader'], response=response)

in the cli

$ scrapy check crawlername
Traceback... loader = MyItemLoader(response.meta['loader'], response=response)
KeyError: 'loader'

The idea that I am thinking about is this:

class LoadedScrapesContract(Contract):
    """ Contract to check presence of fields in scraped items
        @loadedscrapes page_name page_body
    """

    name = 'loadedscrapes'

    def pre_process(self, response):
        # MEDDLE WITH THE RESPONSE OBJECT HERE
        # TO ADD A META ATTRIBUTE TO RESPONSE,
        # LIKE AN EMPTY Item() or dict, JUST TO MAKE
        # THE ITEM LOADER INSTANTIATION PASS

    # this is same as ScrapesContract 
    def post_process(self, output):
        for x in output:
            if isinstance(x, BaseItem):
                for arg in self.args:
                    if not arg in x:
                        raise ContractFail("'%s' field is missing" % arg)

回答1:


The best solution I've found for this, is to do the following rather than mucking up the contract

loader = MyItemLoader(response.meta.get('loader', MyItem()), response=response)

I prefer this method, but to stick the question, override adjust_request_args



来源:https://stackoverflow.com/questions/27368342/how-to-add-attributes-to-a-request-in-a-scrapy-contract

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!