scrapy how to set referer url

前端 未结 4 1934
旧时难觅i
旧时难觅i 2020-12-16 17:54

I need to set the referer url, before scraping a site, the site uses refering url based Authentication, so it does not allow me to login if the referer is not valid.

4条回答
  •  失恋的感觉
    2020-12-16 18:31

    You should do exactly as @warwaruk indicated, below is my example elaboration for a crawl spider:

    from scrapy.spiders import CrawlSpider
    from scrapy import Request
    
    class MySpider(CrawlSpider):
      name = "myspider"
      allowed_domains = ["example.com"]
      start_urls = [
          'http://example.com/foo'
          'http://example.com/bar'
          'http://example.com/baz'
          ]
      rules = [(...)]
    
      def start_requests(self):
        requests = []
        for item in self.start_urls:
          requests.append(Request(url=item, headers={'Referer':'http://www.example.com/'}))
        return requests    
    
      def parse_me(self, response):
        (...)
    

    This should generate following logs in your terminal:

    (...)
    [myspider] DEBUG: Crawled (200)  (referer: http://www.example.com/)
    (...)
    [myspider] DEBUG: Crawled (200)  (referer: http://www.example.com/)
    (...)
    [myspider] DEBUG: Crawled (200)  (referer: http://www.example.com/)
    (...)
    

    Will work same with BaseSpider. In the end start_requests method is BaseSpider method, from which CrawlSpider inherits from.

    Documentation explains more options to be set in Request apart from headers, such as: cookies , callback function, priority of the request etc.

提交回复
热议问题