How to add Headers to Scrapy CrawlSpider Requests?

后端 未结 2 1118
迷失自我
迷失自我 2020-12-17 23:50

I\'m working with the CrawlSpider class to crawl a website and I would like to modify the headers that are sent in each request. Specifically, I would like to add the refer

相关标签:
2条回答
  • 2020-12-18 00:02

    You can pass REFERER manually to each request using headers argument:

    yield Request(parse=..., headers={'referer':...})
    

    RefererMiddleware does the same, automatically taking the referrer url from the previous response.

    0 讨论(0)
  • 2020-12-18 00:05

    I hate to answer my own question, but I found out how to do it. You have to enable the SpiderMiddleware that will populate the referer for responses. See the documentation for scrapy.contrib.spidermiddleware.referer.RefererMiddleware

    In short, you need to add this middleware to your project's settings file.

    SPIDER_MIDDLEWARES = {
    'scrapy.contrib.spidermiddleware.referer.RefererMiddleware': True,
    }
    

    Then in your response parsing method you can use, response.request.headers.get('Referrer', None), to get the referer.

    If you understand these middlewares right away, read them again, take a break, and then read them again. I found them to be very confusing.

    0 讨论(0)
提交回复
热议问题