问题
while crawling website like https://www.netflix.com, getting Forbidden by robots.txt: https://www.netflix.com/>
ERROR: No response downloaded for: https://www.netflix.com/
回答1:
In the new version (scrapy 1.1) launched 2016-05-11 the crawl first downloads robots.txt before crawling. To change this behavior change in your settings.py with ROBOTSTXT_OBEY
ROBOTSTXT_OBEY=False
Here are the release notes
回答2:
First thing you need to ensure is that you change your user agent in the request, otherwise default user agent will be blocked for sure.
来源:https://stackoverflow.com/questions/37274835/getting-forbidden-by-robots-txt-scrapy