Prevent site data from being crawled and ripped

前端 未结 12 917
终归单人心
终归单人心 2020-12-15 06:32

I\'m looking into building a content site with possibly thousands of different entries, accessible by index and by search.

What are the measures I can take to preven

12条回答
  •  庸人自扰
    2020-12-15 06:39

    In short: you cannot prevent ripping. Malicious bots commonly use IE user agents and are fairly intelligent nowadays. If you want to have your site accessible to the maximum number (ie screenreaders, etc) you cannot use javascript or one of the popular plugins (flash) simply because they can inhibit a legitimate user's access.

    Perhaps you could have a cron job that picks a random snippet out of your database and googles it to check for matches. You could then try and get hold of the offending site and demand they take the content down.

    You could also monitor the number of requests from a given IP and block it if it passes a threshold, although you may have to whitelist legitimate bots and would be no use against a botnet (but if you are up against a botnet, perhaps ripping is not your biggest problem).

提交回复
热议问题