发表新帖

发表新帖

Screen scraping: getting around “HTTP Error 403: request disallowed by robots.txt”

前端未结

关注

 8  1142

借酒劲吻你 2020-12-12 17:15

Is there a way to get around the following?

httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt

Is the only way around

8条回答

余生分开走 (楼主)

2020-12-12 17:45

You can try lying about your user agent (e.g., by trying to make believe you're a human being and not a robot) if you want to get in possible legal trouble with Barnes & Noble. Why not instead get in touch with their business development department and convince them to authorize you specifically? They're no doubt just trying to avoid getting their site scraped by some classes of robots such as price comparison engines, and if you can convince them that you're not one, sign a contract, etc, they may well be willing to make an exception for you.

A "technical" workaround that just breaks their policies as encoded in robots.txt is a high-legal-risk approach that I would never recommend. BTW, how does their robots.txt read?

0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题