Screen scraping: getting around “HTTP Error 403: request disallowed by robots.txt”

前端 未结 8 1126
借酒劲吻你
借酒劲吻你 2020-12-12 17:15

Is there a way to get around the following?

httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt

Is the only way around

8条回答
  •  温柔的废话
    2020-12-12 17:44

    The error you're receiving is not related to the user agent. mechanize by default checks robots.txt directives automatically when you use it to navigate to a site. Use the .set_handle_robots(false) method of mechanize.browser to disable this behavior.

提交回复
热议问题