Prevent site data from being crawled and ripped

前端 未结 12 924
终归单人心
终归单人心 2020-12-15 06:32

I\'m looking into building a content site with possibly thousands of different entries, accessible by index and by search.

What are the measures I can take to preven

12条回答
  •  死守一世寂寞
    2020-12-15 06:44

    Between this:

    What are the measures I can take to prevent malicious crawlers from ripping

    and this:

    I wouldn't want to block legitimate crawlers all together.

    you're asking for a lot. Fact is, if you're going to try and block malicious scrapers, you're going to end up blocking all the "good" crawlers too.

    You have to remember that if people want to scrape your content, they're going to put in a lot more manual effort than a search engine bot will... So get your priorities right. You've two choices:

    1. Let the peasants of the internet steal your content. Keep an eye out for it (searching Google for some of your more unique phrases) and sending take-down requests to ISPs. This choice has barely any impact on your apart from the time.
    2. Use AJAX and rolling encryption to request all your content from the server. You'll need to keep the method changing, or even random so each pageload carries a different encryption scheme. But even this will be cracked if somebody wants to crack it. You'll also drop off the face of the search engines and therefore take a hit in traffic of real users.

提交回复
热议问题