The way to detect web scraping

后端 未结 4 1304
旧巷少年郎
旧巷少年郎 2020-12-30 08:29

I need to detect scraping of info on my website. I tried detection based on behavior patterns, and it seems to be promising, although relatively computing heavy.

The

4条回答
  •  天涯浪人
    2020-12-30 09:10

    Ok, someone could build a robot that would enter your website, download the html (not the images, css, etc, as in @hoju's response) and build a graph of the links to be traversed on your site.

    The robot could use random timings to make each request and change the IP in each of them using a proxy, a VPN, Tor, etc.

    I was tempted to answer that you could try to trick the robot by adding hidden links using CSS (a common solution found on the Internet). But it is not a solution. When the robot accesses a forbidden link you can prohibit access to that IP. But you would end up with a huge list of banned IPs. Also, if someone started spoofing IPs and making requests to that link on your server, you could end up isolated from the world. Apart from anything else, it is possible that a solution can be implemented that allows the robot to see the hidden links.

    A more effective way, I think, would be to check the IP of each incoming request, with an API that detects proxies, VPNs, Tor, etc. I searched Google for "api detection vpn proxy tor" and found some (paid) services. Maybe there are free ones.

    If the API response is positive, forward the request to a Captcha.

提交回复
热议问题