Web Crawler - Ignore Robots.txt file?

后端 未结 2 721
借酒劲吻你
借酒劲吻你 2020-12-31 07:34

Some servers have a robots.txt file in order to stop web crawlers from crawling through their websites. Is there a way to make a web crawler ignore the robots.txt file? I am

2条回答
  •  Happy的楠姐
    2020-12-31 07:55

    The documentation for mechanize has this sample code:

    br = mechanize.Browser()
    ....
    # Ignore robots.txt.  Do not do this without thought and consideration.
    br.set_handle_robots(False)
    

    That does exactly what you want.

提交回复
热议问题