How to Block Spam Referrers like darodar.com from Accessing Website?

前端 未结 14 2435
北海茫月
北海茫月 2020-11-22 16:22

I have several websites that get daily around 5% of visits from spam referrers. There is one strange things I noticed about this referrers: they show in Google Analytics, bu

14条回答
  •  庸人自扰
    2020-11-22 17:05

    2019 update

    I may have a solution to this problem as I find none of the other solutions to be effective.

    Let me address the problems of the existing solutions first

    1. Add a filter for each referrer spam domain.
    2. How many domains will you add?
    3. Most of these referrer spam domains exist for sometime and then disappear
    4. Maintain a blacklist of referrer spam domains.
    5. This gets even more complicated as they are basically endless in numbers.
    6. You would have to keep updating the blacklist.
    7. Also bigger the blacklist, the more time you need to scan it
    8. Anything else such as maintaining a manual htaccess or something will require manual intervention which will not scale as your site becomes more popular
    9. Anything automatic such as using AI to determine patterns in how referrer spam domains appear will have a hit/miss thing

    How do these bots work?

    First, it is crucial to understand how these bots work

    1. They use regex patterns at the least such as /UA-\d{6}/ to load tracking ids which they visit recursively after starting at a seed website

    I believe I have a solution that offers the following advantages

    1. No need to maintain whitelists and blacklist
    2. Will work against 99% of them easily and can always be modified to take it to 100%
    3. Requires almost NO manual intervention
    4. The idea is to NOT have a tracking ID at all in the script

    Here is an example

    script.
          //- Google Analytics ID
          var a = [85, 65, 45, 49, 49, 49, 49, 49, 49, 49, 49, 49, 45, 50];
    
          var newScript = document.createElement("script");
          newScript.type = "text/javascript";
          newScript.setAttribute("async", "true");
          newScript.setAttribute("src", "https://www.googletagmanager.com/gtag/js?id=" + a.map(i => String.fromCharCode(i)).join(""));
          document.documentElement.firstChild.appendChild(newScript);
    
          window.dataLayer = window.dataLayer || [];
          function gtag(){dataLayer.push(arguments);}
          gtag('js', new Date());
          gtag('config', a.map(i => String.fromCharCode(i)).join(""), { 'send_page_view': false });
          // Feature detects Navigation Timing API support.
          if (window.performance) {
            // Gets the number of milliseconds since page load
            // (and rounds the result since the value must be an integer).
            var timeSincePageLoad = Math.round(performance.now());
            console.log(timeSincePageLoad)
            // Sends the timing event to Google Analytics.
            gtag('event', 'timing_complete', {
              'name': 'load',
              'value': timeSincePageLoad,
              'event_category': '#{title}'
            });
          }
    
    1. We take a very simple approach, break the tracking ID of the form 'UA-1111111-1' into a char code array

    2. Now we construct the tracking ID dynamically from the char code array at any point we need a reference to the tracking ID

    3. The approach can be made infinitely more complex by turning it into encrypted bunch of numbers, base 8 , hexadecimal, adding a fixed offset, a random offset during each run, RSA encrypting the tracking ID with a private key on the server and decrypting it with a public key but the basic approach is REALLY fast, as arrays in JS are really fast, can easily beat 99% of the bots

提交回复
热议问题