Longest binary sequence with no equal n-length subsequences

白昼怎懂夜的黑 提交于 2020-01-09 11:18:26

问题


We are looking for an algorithm with the following criteria.

Input is an arbitrary positive integer (n), that represents the length of the compare subsequences.

We search the longest binary sequence, which contains no equal n-length subsequences. Matched equal sequences can be overlapped (also an interesting problem when matches must be disjoint). Output will be this sequence of bits.

For example, if n = 3:

10111010 is invalid because of the repeating 101 subsequences. 01010 is also invalid because of multiple occurrences of 010. 01101001 is valid, but evidently not the longest possible sequence.


回答1:


Googling for binary De Bruijn sequence algorithms, I found this one where you can actually tell what's happening. Known as the "FKM algorithm" (after Fredricksen, Kessler and Maiorana), it finds the lexicographically least De Bruijn sequence using the "necklace prefix" method. I'll explain using the example with n=4.

First, create all binary sequences of length n, i.e. all numbers from 0 to 2n-1:

0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111

Then, remove the sequences which are not in their lowest rotation, e.g. 0110 can be rotated to 0011 which is smaller:

0000, 0001, 0011, 0101, 0111, 1111

(You'll notice that this removes a.o. all even numbers except 0000, and all numbers greater than 0111 except 1111, which helps to simplify code.)

Then reduce the sequences to their "aperiodic prefix", i.e. if they are a repetition of a shorter sequence, use that shorter sequence; e.g. 0101 is a repetition of 01, 1111 is a repetition of 1:

0, 0001, 0011, 01, 0111, 1

Join the sequences, and you have a De Bruijn sequence:

0000100110101111

For a non-circular sequence, add n-1 zeros:

0000100110101111000

(further information: F. Ruskey, J. Sawada, A. Williams: "De Bruijn Sequences for Fixed-Weight Binary Strings" and B. Stevens, A. Williams: "The Coolest Order Of Binary Strings", from: "Fun With Algorithms", 2012, pp. 327-328)


I was curious to see how FKM performed compared to my other algorithm, so I wrote this rather clumsy JavaScript implementation. It is indeed much faster, and generates the 1,048,595 digit sequence for N=20 in under a second. In a serious language this should be very fast.

function DeBruijnFKM(n) {
    var seq = "0";                                         // start with 0 precalculated
    for (var i = 1; i < n; i++) {                      // i = number of significant bits
        var zeros = "", max = Math.pow(2, i);
        for (var j = n; j > i; j--) zeros += "0";                   // n-i leading zeros
        for (var k = i > 1 ? max / 2 + 1 : 1; k < max; k += 2) {     // odd numbers only
            var bin = k.toString(2);                           // bin = significant bits
            if (isSmallestRotation(zeros, bin)) {
                seq += aperiodicPrefix(zeros, bin);
            }
        }
    }
    return seq + Math.pow(2, n - 1).toString(2);      // append 2^N-1 and trailing zeros

    function isSmallestRotation(zeros, bin) {
        var len = 0, pos = 1;   // len = number of consecutive zeros in significant bits
        for (var i = 1; i < bin.length; i++) {
            if (bin.charAt(i) == "1") {
                if (len > zeros.length) return false;   // more zeros than leading zeros
                if (len == zeros.length
                && zeros + bin > bin.substr(pos) + zeros + bin.substr(0, pos)) {
                    return false;                              // smaller rotation found
                }
                len = 0;
                pos = i + 1;
            }
            else ++len;
        }
        return true;
    }

    function aperiodicPrefix(zeros, bin) {
        if (zeros.length >= bin.length) return zeros + bin;    // too many leading zeros
        bin = zeros + bin;
        for (var i = 2; i <= bin.length / 2; i++) {  // skip 1; not used for 0 and 2^N-1
            if (bin.length % i) continue;
            var pre = bin.substr(0, i);                      // pre = prefix of length i
            for (var j = i; j < bin.length; j += i) {
                if (pre != bin.substr(j, i)) break;              // non-equal part found
            }
            if (j == bin.length) return pre;                      // all parts are equal
        }
        return bin;                                               // no repetition found
    }
}

document.write(DeBruijnFKM(10));



回答2:


Using the free Minizinc constraint solver, you can write the search for a given sequence length as follows:

int: n = 3;
int: k = pow(2,n)+n-1;

array[1..k] of var 0..1: a;

constraint
  forall (i in 1..k-n) (
    forall (j in i+1..k-n+1) (
      exists (x in 0..n-1)(
        a[i+x] != a[j+x]
      )
    )
  );

solve satisfy;

output [show(a[m]) | m in 1..k];

For n=3, the longest sequence is

1110100011

k=11 yields UNSATISFIABLE

It took 71ms to find the sequence on k=10 bits for sub-sequence length n=3. For sub-sequence length n=9, the total sequence of 520 bits was found in 6.1s.




回答3:


An n-bit linear feedback shift register, if it can operate at maximum period, must meet most of the requirements. This is because its operating state is the size of the test window. If ever a bit pattern occurred more than once then its state would have reverted to a previous state and its period would be shorter than expected.

Unfortunately an LFSR cannot run with a state of zero. To overcome this, simply append zeroes to the beginning of the bit string.

void generate(int n) {
  static const uint64_t polytab[64] = {
    0x2, 0x2, 0x6, 0xc,
    0x18, 0x28, 0x60, 0xc0,
    0x170,0x220, 0x480, 0xa00,
    0x1052, 0x201a, 0x402a, 0xc000,
    /* table can be completed from: 
     * http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
     */
  };
  uint64_t poly = polytab[n];
  uint64_t m = ~(-2ll << (n - 1));
  uint64_t s = 1;
  for (i = 0; i < n; i++) emit(0);
  do {
    emit(s & 1);
    s <<= 1;
    s = (s + parity(s & poly)) & m;
  } while (s != 1);
}

If you need a test window longer than 64 bits then just use 64 bits (or if you must you can extend the arithmetic to 128 bits). Beyond 64 bits some other resource will be exhausted before it is discovered that the bit string is not maximum length.

For completeness, a parity function:

int parity(uint64_t m) {
  int p = 0;
  while (m != 0) {
    m &= m - 1;
    p ^= 1;
  }
  return p;
}

Outputs for n=3, 4, and 5:

3: 0001011100
4: 0000100110101111000
5: 000001001011001111100011011101010000



回答4:


Before finding the FKM algorithm, I fiddled around with a simple recursive algorithm that tries every combination of 0's and 1's and returns the (lexicographically) first result. I found that this method quickly runs out of memory (at least in JavaScript in a browser), so I tried to come up with an improved non-recursive version, based on these observations:

  • By running through the N-length binary strings from 0 to 2N-1, and checking whether they are already present in the sequence, and if not, checking whether they overlap partially with the end of the sequence, you can build up the lexicographically smallest binary De Bruijn sequence with N-length chunks instead of per-bit.

  • You only need to go through the N-length binary strings up to 2N-1-1, and then append 2N-1 without overlap. The N-length strings starting with a '1' need not be checked.

  • You can skip the even numbers greater than 2; they are bit-shifted versions of smaller numbers that are already in the sequence. The number 2 is needed to avoid 1 and 3 incorrectly overlapping; code-wise, you can fix this by starting the sequence with 0, 1 and 2 already in place (e.g. 0000010 for N=5) and then iterating over every odd number starting at 3.

Example for N=5:

 0    00000
 1     00001
 2      00010
 3          00011
 4      (00100)
 5               00101
 6          (00110)
 7                    00111
 8       (01000)
 9                 (01001)
10               (01010)
11                         01011
12           (01100)
13                           01101
14                    (01110)
15                              01111
                                    +10000
=>    000001000110010100111010110111110000

As you can see, the sequence is built with the strings 00000 to 01111 and the appended 10000, and the strings 10001 to 11111 need not be checked. All even numbers greater than 2 can be skipped (as could the numbers 9 and 13).

This code example shows a simple implementation in JavaScript. It's fast up to N=14 or so, and will give you all 1,048,595 characters for N=20 if you have a few minutes.

function binaryDeBruijn(n) {
    var zeros = "", max = Math.pow(2, n - 1);             // check only up to 2^(N-1)
    for (var i = 1; i < n; i++) zeros += "0";
    var seq = zeros + (n > 2 ? "010" : "0");              // start with 0-2 precalculated
    for (var i = 3; i < max; i += 2) {                    // odd numbers from 3
        var part = (zeros + i.toString(2)).substr(-n, n); // binary with leading zeros
        if (seq.indexOf(part) == -1) {                    // part not already in sequence
            for (var j = n - 1; j > 0; j--) {             // try partial match at end
                if (seq.substr(-j, j) == part.substr(0, j)) break; // partial match found
            }
            seq += part.substr(j, n);                     // overlap with end or append
        }
    }
    return seq + "1" + zeros;                             // append 2^(N-1)
}

document.write(binaryDeBruijn(10));

There are other numbers besides the even numbers which could be skipped (e.g. the numbers 9 and 13 in the example); if you could predict these numbers, this would of course make the algorithm much more efficient, but I'm not sure there's an obvious pattern there.



来源:https://stackoverflow.com/questions/35370539/longest-binary-sequence-with-no-equal-n-length-subsequences

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!