Longest binary sequence with no equal n-length subsequences

前端 未结 4 1958
耶瑟儿~
耶瑟儿~ 2020-12-10 20:34

We are looking for an algorithm with the following criteria.

Input is an arbitrary positive integer (n), that represents the length of the compare subse

相关标签:
4条回答
  • 2020-12-10 20:58

    Before finding the FKM algorithm, I fiddled around with a simple recursive algorithm that tries every combination of 0's and 1's and returns the (lexicographically) first result. I found that this method quickly runs out of memory (at least in JavaScript in a browser), so I tried to come up with an improved non-recursive version, based on these observations:

    • By running through the N-length binary strings from 0 to 2N-1, and checking whether they are already present in the sequence, and if not, checking whether they overlap partially with the end of the sequence, you can build up the lexicographically smallest binary De Bruijn sequence with N-length chunks instead of per-bit.

    • You only need to go through the N-length binary strings up to 2N-1-1, and then append 2N-1 without overlap. The N-length strings starting with a '1' need not be checked.

    • You can skip the even numbers greater than 2; they are bit-shifted versions of smaller numbers that are already in the sequence. The number 2 is needed to avoid 1 and 3 incorrectly overlapping; code-wise, you can fix this by starting the sequence with 0, 1 and 2 already in place (e.g. 0000010 for N=5) and then iterating over every odd number starting at 3.

    Example for N=5:

     0    00000
     1     00001
     2      00010
     3          00011
     4      (00100)
     5               00101
     6          (00110)
     7                    00111
     8       (01000)
     9                 (01001)
    10               (01010)
    11                         01011
    12           (01100)
    13                           01101
    14                    (01110)
    15                              01111
                                        +10000
    =>    000001000110010100111010110111110000
    

    As you can see, the sequence is built with the strings 00000 to 01111 and the appended 10000, and the strings 10001 to 11111 need not be checked. All even numbers greater than 2 can be skipped (as could the numbers 9 and 13).

    This code example shows a simple implementation in JavaScript. It's fast up to N=14 or so, and will give you all 1,048,595 characters for N=20 if you have a few minutes.

    function binaryDeBruijn(n) {
        var zeros = "", max = Math.pow(2, n - 1);             // check only up to 2^(N-1)
        for (var i = 1; i < n; i++) zeros += "0";
        var seq = zeros + (n > 2 ? "010" : "0");              // start with 0-2 precalculated
        for (var i = 3; i < max; i += 2) {                    // odd numbers from 3
            var part = (zeros + i.toString(2)).substr(-n, n); // binary with leading zeros
            if (seq.indexOf(part) == -1) {                    // part not already in sequence
                for (var j = n - 1; j > 0; j--) {             // try partial match at end
                    if (seq.substr(-j, j) == part.substr(0, j)) break; // partial match found
                }
                seq += part.substr(j, n);                     // overlap with end or append
            }
        }
        return seq + "1" + zeros;                             // append 2^(N-1)
    }
    
    document.write(binaryDeBruijn(10));

    There are other numbers besides the even numbers which could be skipped (e.g. the numbers 9 and 13 in the example); if you could predict these numbers, this would of course make the algorithm much more efficient, but I'm not sure there's an obvious pattern there.

    0 讨论(0)
  • 2020-12-10 21:05

    Googling for binary De Bruijn sequence algorithms, I found this one where you can actually tell what's happening. Known as the "FKM algorithm" (after Fredricksen, Kessler and Maiorana), it finds the lexicographically least De Bruijn sequence using the "necklace prefix" method. I'll explain using the example with n=4.

    First, create all binary sequences of length n, i.e. all numbers from 0 to 2n-1:

    0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111

    Then, remove the sequences which are not in their lowest rotation, e.g. 0110 can be rotated to 0011 which is smaller:

    0000, 0001, 0011, 0101, 0111, 1111

    (You'll notice that this removes a.o. all even numbers except 0000, and all numbers greater than 0111 except 1111, which helps to simplify code.)

    Then reduce the sequences to their "aperiodic prefix", i.e. if they are a repetition of a shorter sequence, use that shorter sequence; e.g. 0101 is a repetition of 01, 1111 is a repetition of 1:

    0, 0001, 0011, 01, 0111, 1

    Join the sequences, and you have a De Bruijn sequence:

    0000100110101111

    For a non-circular sequence, add n-1 zeros:

    0000100110101111000

    (further information: F. Ruskey, J. Sawada, A. Williams: "De Bruijn Sequences for Fixed-Weight Binary Strings" and B. Stevens, A. Williams: "The Coolest Order Of Binary Strings", from: "Fun With Algorithms", 2012, pp. 327-328)


    I was curious to see how FKM performed compared to my other algorithm, so I wrote this rather clumsy JavaScript implementation. It is indeed much faster, and generates the 1,048,595 digit sequence for N=20 in under a second. In a serious language this should be very fast.

    function DeBruijnFKM(n) {
        var seq = "0";                                         // start with 0 precalculated
        for (var i = 1; i < n; i++) {                      // i = number of significant bits
            var zeros = "", max = Math.pow(2, i);
            for (var j = n; j > i; j--) zeros += "0";                   // n-i leading zeros
            for (var k = i > 1 ? max / 2 + 1 : 1; k < max; k += 2) {     // odd numbers only
                var bin = k.toString(2);                           // bin = significant bits
                if (isSmallestRotation(zeros, bin)) {
                    seq += aperiodicPrefix(zeros, bin);
                }
            }
        }
        return seq + Math.pow(2, n - 1).toString(2);      // append 2^N-1 and trailing zeros
    
        function isSmallestRotation(zeros, bin) {
            var len = 0, pos = 1;   // len = number of consecutive zeros in significant bits
            for (var i = 1; i < bin.length; i++) {
                if (bin.charAt(i) == "1") {
                    if (len > zeros.length) return false;   // more zeros than leading zeros
                    if (len == zeros.length
                    && zeros + bin > bin.substr(pos) + zeros + bin.substr(0, pos)) {
                        return false;                              // smaller rotation found
                    }
                    len = 0;
                    pos = i + 1;
                }
                else ++len;
            }
            return true;
        }
    
        function aperiodicPrefix(zeros, bin) {
            if (zeros.length >= bin.length) return zeros + bin;    // too many leading zeros
            bin = zeros + bin;
            for (var i = 2; i <= bin.length / 2; i++) {  // skip 1; not used for 0 and 2^N-1
                if (bin.length % i) continue;
                var pre = bin.substr(0, i);                      // pre = prefix of length i
                for (var j = i; j < bin.length; j += i) {
                    if (pre != bin.substr(j, i)) break;              // non-equal part found
                }
                if (j == bin.length) return pre;                      // all parts are equal
            }
            return bin;                                               // no repetition found
        }
    }
    
    document.write(DeBruijnFKM(10));

    0 讨论(0)
  • 2020-12-10 21:14

    An n-bit linear feedback shift register, if it can operate at maximum period, must meet most of the requirements. This is because its operating state is the size of the test window. If ever a bit pattern occurred more than once then its state would have reverted to a previous state and its period would be shorter than expected.

    Unfortunately an LFSR cannot run with a state of zero. To overcome this, simply append zeroes to the beginning of the bit string.

    void generate(int n) {
      static const uint64_t polytab[64] = {
        0x2, 0x2, 0x6, 0xc,
        0x18, 0x28, 0x60, 0xc0,
        0x170,0x220, 0x480, 0xa00,
        0x1052, 0x201a, 0x402a, 0xc000,
        /* table can be completed from: 
         * http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
         */
      };
      uint64_t poly = polytab[n];
      uint64_t m = ~(-2ll << (n - 1));
      uint64_t s = 1;
      for (i = 0; i < n; i++) emit(0);
      do {
        emit(s & 1);
        s <<= 1;
        s = (s + parity(s & poly)) & m;
      } while (s != 1);
    }
    

    If you need a test window longer than 64 bits then just use 64 bits (or if you must you can extend the arithmetic to 128 bits). Beyond 64 bits some other resource will be exhausted before it is discovered that the bit string is not maximum length.

    For completeness, a parity function:

    int parity(uint64_t m) {
      int p = 0;
      while (m != 0) {
        m &= m - 1;
        p ^= 1;
      }
      return p;
    }
    

    Outputs for n=3, 4, and 5:

    3: 0001011100
    4: 0000100110101111000
    5: 000001001011001111100011011101010000
    
    0 讨论(0)
  • 2020-12-10 21:15

    Using the free Minizinc constraint solver, you can write the search for a given sequence length as follows:

    int: n = 3;
    int: k = pow(2,n)+n-1;
    
    array[1..k] of var 0..1: a;
    
    constraint
      forall (i in 1..k-n) (
        forall (j in i+1..k-n+1) (
          exists (x in 0..n-1)(
            a[i+x] != a[j+x]
          )
        )
      );
    
    solve satisfy;
    
    output [show(a[m]) | m in 1..k];
    

    For n=3, the longest sequence is

    1110100011

    k=11 yields UNSATISFIABLE

    It took 71ms to find the sequence on k=10 bits for sub-sequence length n=3. For sub-sequence length n=9, the total sequence of 520 bits was found in 6.1s.

    0 讨论(0)
提交回复
热议问题