Is it correct to use JavaScript Array.sort() method for shuffling?

前端 未结 12 1565
广开言路
广开言路 2020-11-22 03:48

I was helping somebody out with his JavaScript code and my eyes were caught by a section that looked like that:

function randOrd(){
  return (Math.round(Math         


        
相关标签:
12条回答
  • 2020-11-22 04:19

    I did some measurements of how random the results of this random sort are...

    My technique was to take a small array [1,2,3,4] and create all (4! = 24) permutations of it. Then I would apply the shuffling function to the array a large number of times and count how many times each permutation is generated. A good shuffling algoritm would distribute the results quite evenly over all the permutations, while a bad one would not create that uniform result.

    Using the code below I tested in Firefox, Opera, Chrome, IE6/7/8.

    Surprisingly for me, the random sort and the real shuffle both created equally uniform distributions. So it seems that (as many have suggested) the main browsers are using merge sort. This of course doesn't mean, that there can't be a browser out there, that does differently, but I would say it means, that this random-sort-method is reliable enough to use in practice.

    EDIT: This test didn't really measured correctly the randomness or lack thereof. See the other answer I posted.

    But on the performance side the shuffle function given by Cristoph was a clear winner. Even for small four-element arrays the real shuffle performed about twice as fast as random-sort!

    // The shuffle function posted by Cristoph.
    var shuffle = function(array) {
        var tmp, current, top = array.length;
    
        if(top) while(--top) {
            current = Math.floor(Math.random() * (top + 1));
            tmp = array[current];
            array[current] = array[top];
            array[top] = tmp;
        }
    
        return array;
    };
    
    // the random sort function
    var rnd = function() {
      return Math.round(Math.random())-0.5;
    };
    var randSort = function(A) {
      return A.sort(rnd);
    };
    
    var permutations = function(A) {
      if (A.length == 1) {
        return [A];
      }
      else {
        var perms = [];
        for (var i=0; i<A.length; i++) {
          var x = A.slice(i, i+1);
          var xs = A.slice(0, i).concat(A.slice(i+1));
          var subperms = permutations(xs);
          for (var j=0; j<subperms.length; j++) {
            perms.push(x.concat(subperms[j]));
          }
        }
        return perms;
      }
    };
    
    var test = function(A, iterations, func) {
      // init permutations
      var stats = {};
      var perms = permutations(A);
      for (var i in perms){
        stats[""+perms[i]] = 0;
      }
    
      // shuffle many times and gather stats
      var start=new Date();
      for (var i=0; i<iterations; i++) {
        var shuffled = func(A);
        stats[""+shuffled]++;
      }
      var end=new Date();
    
      // format result
      var arr=[];
      for (var i in stats) {
        arr.push(i+" "+stats[i]);
      }
      return arr.join("\n")+"\n\nTime taken: " + ((end - start)/1000) + " seconds.";
    };
    
    alert("random sort: " + test([1,2,3,4], 100000, randSort));
    alert("shuffle: " + test([1,2,3,4], 100000, shuffle));
    
    0 讨论(0)
  • 2020-11-22 04:24

    After Jon has already covered the theory, here's an implementation:

    function shuffle(array) {
        var tmp, current, top = array.length;
    
        if(top) while(--top) {
            current = Math.floor(Math.random() * (top + 1));
            tmp = array[current];
            array[current] = array[top];
            array[top] = tmp;
        }
    
        return array;
    }
    

    The algorithm is O(n), whereas sorting should be O(n log n). Depending on the overhead of executing JS code compared to the native sort() function, this might lead to a noticable difference in performance which should increase with array sizes.


    In the comments to bobobobo's answer, I stated that the algorithm in question might not produce evenly distributed probabilities (depending on the implementation of sort()).

    My argument goes along these lines: A sorting algorithm requires a certain number c of comparisons, eg c = n(n-1)/2 for Bubblesort. Our random comparison function makes the outcome of each comparison equally likely, ie there are 2^c equally probable results. Now, each result has to correspond to one of the n! permutations of the array's entries, which makes an even distribution impossible in the general case. (This is a simplification, as the actual number of comparisons neeeded depends on the input array, but the assertion should still hold.)

    As Jon pointed out, this alone is no reason to prefer Fisher-Yates over using sort(), as the random number generator will also map a finite number of pseudo-random values to the n! permutations. But the results of Fisher-Yates should still be better:

    Math.random() produces a pseudo-random number in the range [0;1[. As JS uses double-precision floating point values, this corresponds to 2^x possible values where 52 ≤ x ≤ 63 (I'm too lazy to find the actual number). A probability distribution generated using Math.random() will stop behaving well if the number of atomic events is of the same order of magnitude.

    When using Fisher-Yates, the relevant parameter is the size of the array, which should never approach 2^52 due to practical limitations.

    When sorting with a random comparision function, the function basically only cares if the return value is positive or negative, so this will never be a problem. But there is a similar one: Because the comparison function is well-behaved, the 2^c possible results are, as stated, equally probable. If c ~ n log n then 2^c ~ n^(a·n) where a = const, which makes it at least possible that 2^c is of same magnitude as (or even less than) n! and thus leading to an uneven distribution, even if the sorting algorithm where to map onto the permutaions evenly. If this has any practical impact is beyond me.

    The real problem is that the sorting algorithms are not guaranteed to map onto the permutations evenly. It's easy to see that Mergesort does as it's symmetric, but reasoning about something like Bubblesort or, more importantly, Quicksort or Heapsort, is not.


    The bottom line: As long as sort() uses Mergesort, you should be reasonably safe except in corner cases (at least I'm hoping that 2^c ≤ n! is a corner case), if not, all bets are off.

    0 讨论(0)
  • 2020-11-22 04:24

    Can you use the Array.sort() function to shuffle an array – Yes.

    Are the results random enough – No.

    Consider the following code snippet:

    var array = ["a", "b", "c", "d", "e"];
    var stats = {};
    array.forEach(function(v) {
      stats[v] = Array(array.length).fill(0);
    });
    //stats = {
    //    a: [0, 0, 0, ...]
    //    b: [0, 0, 0, ...]
    //    c: [0, 0, 0, ...]
    //    ...
    //    ...
    //}
    var i, clone;
    for (i = 0; i < 100; i++) {
      clone = array.slice(0);
      clone.sort(function() {
        return Math.random() - 0.5;
      });
      clone.forEach(function(v, i) {
        stats[v][i]++;
      });
    }
    
    Object.keys(stats).forEach(function(v, i) {
      console.log(v + ": [" + stats[v].join(", ") + "]");
    })

    Sample output:

    a [29, 38, 20,  6,  7]
    b [29, 33, 22, 11,  5]
    c [17, 14, 32, 17, 20]
    d [16,  9, 17, 35, 23]
    e [ 9,  6,  9, 31, 45]
    

    Ideally, the counts should be evenly distributed (for the above example, all counts should be around 20). But they are not. Apparently, the distribution depends on what sorting algorithm is implemented by the browser and how it iterates the array items for sorting.

    More insight is provided in this article:
    Array.sort() should not be used to shuffle an array

    0 讨论(0)
  • 2020-11-22 04:25

    I think it's fine for cases where you're not picky about distribution and you want the source code to be small.

    In JavaScript (where the source is transmitted constantly), small makes a difference in bandwidth costs.

    0 讨论(0)
  • 2020-11-22 04:29

    It's never been my favourite way of shuffling, partly because it is implementation-specific as you say. In particular, I seem to remember that the standard library sorting from either Java or .NET (not sure which) can often detect if you end up with an inconsistent comparison between some elements (e.g. you first claim A < B and B < C, but then C < A).

    It also ends up as a more complex (in terms of execution time) shuffle than you really need.

    I prefer the shuffle algorithm which effectively partitions the collection into "shuffled" (at the start of the collection, initially empty) and "unshuffled" (the rest of the collection). At each step of the algorithm, pick a random unshuffled element (which could be the first one) and swap it with the first unshuffled element - then treat it as shuffled (i.e. mentally move the partition to include it).

    This is O(n) and only requires n-1 calls to the random number generator, which is nice. It also produces a genuine shuffle - any element has a 1/n chance of ending up in each space, regardless of its original position (assuming a reasonable RNG). The sorted version approximates to an even distribution (assuming that the random number generator doesn't pick the same value twice, which is highly unlikely if it's returning random doubles) but I find it easier to reason about the shuffle version :)

    This approach is called a Fisher-Yates shuffle.

    I would regard it as a best practice to code up this shuffle once and reuse it everywhere you need to shuffle items. Then you don't need to worry about sort implementations in terms of reliability or complexity. It's only a few lines of code (which I won't attempt in JavaScript!)

    The Wikipedia article on shuffling (and in particular the shuffle algorithms section) talks about sorting a random projection - it's worth reading the section on poor implementations of shuffling in general, so you know what to avoid.

    0 讨论(0)
  • 2020-11-22 04:31

    It is a hack, certainly. In practice, an infinitely looping algorithm is not likely. If you're sorting objects, you could loop through the coords array and do something like:

    for (var i = 0; i < coords.length; i++)
        coords[i].sortValue = Math.random();
    
    coords.sort(useSortValue)
    
    function useSortValue(a, b)
    {
      return a.sortValue - b.sortValue;
    }
    

    (and then loop through them again to remove the sortValue)

    Still a hack though. If you want to do it nicely, you have to do it the hard way :)

    0 讨论(0)
提交回复
热议问题