Screen scraping: getting around “HTTP Error 403: request disallowed by robots.txt”
Is there a way to get around the following? httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Is the only way around this to contact the site-owner (barnesandnoble.com).. i'm building a site that would bring them more sales, not sure why they would deny access at a certain depth. I'm using mechanize and BeautifulSoup on Python2.6. hoping for a work-around You can try lying about your user agent (e.g., by trying to make believe you're a human being and not a robot) if you want to get in possible legal trouble with Barnes & Noble. Why not instead get in touch with their