The way to detect web scraping

后端未结

关注

 4  1316

旧巷少年郎 2020-12-30 08:29

I need to detect scraping of info on my website. I tried detection based on behavior patterns, and it seems to be promising, although relatively computing heavy.

The

4条回答

天涯浪人 (楼主)

2020-12-30 09:10

Ok, someone could build a robot that would enter your website, download the html (not the images, css, etc, as in @hoju's response) and build a graph of the links to be traversed on your site.

The robot could use random timings to make each request and change the IP in each of them using a proxy, a VPN, Tor, etc.

I was tempted to answer that you could try to trick the robot by adding hidden links using CSS (a common solution found on the Internet). But it is not a solution. When the robot accesses a forbidden link you can prohibit access to that IP. But you would end up with a huge list of banned IPs. Also, if someone started spoofing IPs and making requests to that link on your server, you could end up isolated from the world. Apart from anything else, it is possible that a solution can be implemented that allows the robot to see the hidden links.

A more effective way, I think, would be to check the IP of each incoming request, with an API that detects proxies, VPNs, Tor, etc. I searched Google for "api detection vpn proxy tor" and found some (paid) services. Maybe there are free ones.

If the API response is positive, forward the request to a Captcha.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...