Longest binary sequence with no equal n-length subsequences

前端 未结 4 1957
耶瑟儿~
耶瑟儿~ 2020-12-10 20:34

We are looking for an algorithm with the following criteria.

Input is an arbitrary positive integer (n), that represents the length of the compare subse

4条回答
  •  野趣味
    野趣味 (楼主)
    2020-12-10 20:58

    Before finding the FKM algorithm, I fiddled around with a simple recursive algorithm that tries every combination of 0's and 1's and returns the (lexicographically) first result. I found that this method quickly runs out of memory (at least in JavaScript in a browser), so I tried to come up with an improved non-recursive version, based on these observations:

    • By running through the N-length binary strings from 0 to 2N-1, and checking whether they are already present in the sequence, and if not, checking whether they overlap partially with the end of the sequence, you can build up the lexicographically smallest binary De Bruijn sequence with N-length chunks instead of per-bit.

    • You only need to go through the N-length binary strings up to 2N-1-1, and then append 2N-1 without overlap. The N-length strings starting with a '1' need not be checked.

    • You can skip the even numbers greater than 2; they are bit-shifted versions of smaller numbers that are already in the sequence. The number 2 is needed to avoid 1 and 3 incorrectly overlapping; code-wise, you can fix this by starting the sequence with 0, 1 and 2 already in place (e.g. 0000010 for N=5) and then iterating over every odd number starting at 3.

    Example for N=5:

     0    00000
     1     00001
     2      00010
     3          00011
     4      (00100)
     5               00101
     6          (00110)
     7                    00111
     8       (01000)
     9                 (01001)
    10               (01010)
    11                         01011
    12           (01100)
    13                           01101
    14                    (01110)
    15                              01111
                                        +10000
    =>    000001000110010100111010110111110000
    

    As you can see, the sequence is built with the strings 00000 to 01111 and the appended 10000, and the strings 10001 to 11111 need not be checked. All even numbers greater than 2 can be skipped (as could the numbers 9 and 13).

    This code example shows a simple implementation in JavaScript. It's fast up to N=14 or so, and will give you all 1,048,595 characters for N=20 if you have a few minutes.

    function binaryDeBruijn(n) {
        var zeros = "", max = Math.pow(2, n - 1);             // check only up to 2^(N-1)
        for (var i = 1; i < n; i++) zeros += "0";
        var seq = zeros + (n > 2 ? "010" : "0");              // start with 0-2 precalculated
        for (var i = 3; i < max; i += 2) {                    // odd numbers from 3
            var part = (zeros + i.toString(2)).substr(-n, n); // binary with leading zeros
            if (seq.indexOf(part) == -1) {                    // part not already in sequence
                for (var j = n - 1; j > 0; j--) {             // try partial match at end
                    if (seq.substr(-j, j) == part.substr(0, j)) break; // partial match found
                }
                seq += part.substr(j, n);                     // overlap with end or append
            }
        }
        return seq + "1" + zeros;                             // append 2^(N-1)
    }
    
    document.write(binaryDeBruijn(10));

    There are other numbers besides the even numbers which could be skipped (e.g. the numbers 9 and 13 in the example); if you could predict these numbers, this would of course make the algorithm much more efficient, but I'm not sure there's an obvious pattern there.

提交回复
热议问题