Scrapy crawler is being blocked and gets 404
问题 I'm trying to scrape the page 'https://zhuanlan.zhihu.com/wangzhenotes' with Scrapy, with the configuration in the post and the end of this post. This command scrapy shell 'https://zhuanlan.zhihu.com/wangzhenotes' gets me 2020-07-02 05:50:04 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://zhuanlan.zhihu.com/robots.txt> (referer: None) 2020-07-02 05:50:04 [protego] DEBUG: Rule at line 19 without any user agent to enforce it on. ... 2020-07-02 05:50:04 [scrapy.core.engine] DEBUG: Crawled