问题
We are looking for an algorithm with the following criteria.
Input is an arbitrary positive integer (n
), that represents the length of the compare subsequences.
We search the longest binary sequence, which contains no equal n-length subsequences. Matched equal sequences can be overlapped (also an interesting problem when matches must be disjoint). Output will be this sequence of bits.
For example, if n = 3
:
10111010
is invalid because of the repeating 101
subsequences. 01010
is also invalid because of multiple occurrences of 010
. 01101001
is valid, but evidently not the longest possible sequence.
回答1:
Googling for binary De Bruijn sequence algorithms, I found this one where you can actually tell what's happening. Known as the "FKM algorithm" (after Fredricksen, Kessler and Maiorana), it finds the lexicographically least De Bruijn sequence using the "necklace prefix" method. I'll explain using the example with n=4.
First, create all binary sequences of length n, i.e. all numbers from 0 to 2n-1:
0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111
Then, remove the sequences which are not in their lowest rotation, e.g. 0110
can be rotated to 0011
which is smaller:
0000, 0001, 0011, 0101, 0111, 1111
(You'll notice that this removes a.o. all even numbers except 0000
, and all numbers greater than 0111
except 1111
, which helps to simplify code.)
Then reduce the sequences to their "aperiodic prefix", i.e. if they are a repetition of a shorter sequence, use that shorter sequence; e.g. 0101
is a repetition of 01
, 1111
is a repetition of 1
:
0, 0001, 0011, 01, 0111, 1
Join the sequences, and you have a De Bruijn sequence:
0000100110101111
For a non-circular sequence, add n-1 zeros:
0000100110101111000
(further information: F. Ruskey, J. Sawada, A. Williams: "De Bruijn Sequences for Fixed-Weight Binary Strings" and B. Stevens, A. Williams: "The Coolest Order Of Binary Strings", from: "Fun With Algorithms", 2012, pp. 327-328)
I was curious to see how FKM performed compared to my other algorithm, so I wrote this rather clumsy JavaScript implementation. It is indeed much faster, and generates the 1,048,595 digit sequence for N=20 in under a second. In a serious language this should be very fast.
function DeBruijnFKM(n) {
var seq = "0"; // start with 0 precalculated
for (var i = 1; i < n; i++) { // i = number of significant bits
var zeros = "", max = Math.pow(2, i);
for (var j = n; j > i; j--) zeros += "0"; // n-i leading zeros
for (var k = i > 1 ? max / 2 + 1 : 1; k < max; k += 2) { // odd numbers only
var bin = k.toString(2); // bin = significant bits
if (isSmallestRotation(zeros, bin)) {
seq += aperiodicPrefix(zeros, bin);
}
}
}
return seq + Math.pow(2, n - 1).toString(2); // append 2^N-1 and trailing zeros
function isSmallestRotation(zeros, bin) {
var len = 0, pos = 1; // len = number of consecutive zeros in significant bits
for (var i = 1; i < bin.length; i++) {
if (bin.charAt(i) == "1") {
if (len > zeros.length) return false; // more zeros than leading zeros
if (len == zeros.length
&& zeros + bin > bin.substr(pos) + zeros + bin.substr(0, pos)) {
return false; // smaller rotation found
}
len = 0;
pos = i + 1;
}
else ++len;
}
return true;
}
function aperiodicPrefix(zeros, bin) {
if (zeros.length >= bin.length) return zeros + bin; // too many leading zeros
bin = zeros + bin;
for (var i = 2; i <= bin.length / 2; i++) { // skip 1; not used for 0 and 2^N-1
if (bin.length % i) continue;
var pre = bin.substr(0, i); // pre = prefix of length i
for (var j = i; j < bin.length; j += i) {
if (pre != bin.substr(j, i)) break; // non-equal part found
}
if (j == bin.length) return pre; // all parts are equal
}
return bin; // no repetition found
}
}
document.write(DeBruijnFKM(10));
回答2:
Using the free Minizinc constraint solver, you can write the search for a given sequence length as follows:
int: n = 3;
int: k = pow(2,n)+n-1;
array[1..k] of var 0..1: a;
constraint
forall (i in 1..k-n) (
forall (j in i+1..k-n+1) (
exists (x in 0..n-1)(
a[i+x] != a[j+x]
)
)
);
solve satisfy;
output [show(a[m]) | m in 1..k];
For n=3
, the longest sequence is
1110100011
k=11
yields UNSATISFIABLE
It took 71ms to find the sequence on k=10 bits for sub-sequence length n=3. For sub-sequence length n=9, the total sequence of 520 bits was found in 6.1s.
回答3:
An n
-bit linear feedback shift register, if it can operate at maximum period, must meet most of the requirements. This is because its operating state is the size of the test window. If ever a bit pattern occurred more than once then its state would have reverted to a previous state and its period would be shorter than expected.
Unfortunately an LFSR cannot run with a state of zero. To overcome this, simply append zeroes to the beginning of the bit string.
void generate(int n) {
static const uint64_t polytab[64] = {
0x2, 0x2, 0x6, 0xc,
0x18, 0x28, 0x60, 0xc0,
0x170,0x220, 0x480, 0xa00,
0x1052, 0x201a, 0x402a, 0xc000,
/* table can be completed from:
* http://www.xilinx.com/support/documentation/application_notes/xapp052.pdf
*/
};
uint64_t poly = polytab[n];
uint64_t m = ~(-2ll << (n - 1));
uint64_t s = 1;
for (i = 0; i < n; i++) emit(0);
do {
emit(s & 1);
s <<= 1;
s = (s + parity(s & poly)) & m;
} while (s != 1);
}
If you need a test window longer than 64 bits then just use 64 bits (or if you must you can extend the arithmetic to 128 bits). Beyond 64 bits some other resource will be exhausted before it is discovered that the bit string is not maximum length.
For completeness, a parity function:
int parity(uint64_t m) {
int p = 0;
while (m != 0) {
m &= m - 1;
p ^= 1;
}
return p;
}
Outputs for n=3, 4, and 5:
3: 0001011100
4: 0000100110101111000
5: 000001001011001111100011011101010000
回答4:
Before finding the FKM algorithm, I fiddled around with a simple recursive algorithm that tries every combination of 0's and 1's and returns the (lexicographically) first result. I found that this method quickly runs out of memory (at least in JavaScript in a browser), so I tried to come up with an improved non-recursive version, based on these observations:
By running through the N-length binary strings from 0 to 2N-1, and checking whether they are already present in the sequence, and if not, checking whether they overlap partially with the end of the sequence, you can build up the lexicographically smallest binary De Bruijn sequence with N-length chunks instead of per-bit.
You only need to go through the N-length binary strings up to 2N-1-1, and then append 2N-1 without overlap. The N-length strings starting with a '1' need not be checked.
You can skip the even numbers greater than 2; they are bit-shifted versions of smaller numbers that are already in the sequence. The number 2 is needed to avoid 1 and 3 incorrectly overlapping; code-wise, you can fix this by starting the sequence with 0, 1 and 2 already in place (e.g.
0000010
for N=5) and then iterating over every odd number starting at 3.
Example for N=5:
0 00000
1 00001
2 00010
3 00011
4 (00100)
5 00101
6 (00110)
7 00111
8 (01000)
9 (01001)
10 (01010)
11 01011
12 (01100)
13 01101
14 (01110)
15 01111
+10000
=> 000001000110010100111010110111110000
As you can see, the sequence is built with the strings 00000
to 01111
and the appended 10000
, and the strings 10001
to 11111
need not be checked. All even numbers greater than 2 can be skipped (as could the numbers 9 and 13).
This code example shows a simple implementation in JavaScript. It's fast up to N=14 or so, and will give you all 1,048,595 characters for N=20 if you have a few minutes.
function binaryDeBruijn(n) {
var zeros = "", max = Math.pow(2, n - 1); // check only up to 2^(N-1)
for (var i = 1; i < n; i++) zeros += "0";
var seq = zeros + (n > 2 ? "010" : "0"); // start with 0-2 precalculated
for (var i = 3; i < max; i += 2) { // odd numbers from 3
var part = (zeros + i.toString(2)).substr(-n, n); // binary with leading zeros
if (seq.indexOf(part) == -1) { // part not already in sequence
for (var j = n - 1; j > 0; j--) { // try partial match at end
if (seq.substr(-j, j) == part.substr(0, j)) break; // partial match found
}
seq += part.substr(j, n); // overlap with end or append
}
}
return seq + "1" + zeros; // append 2^(N-1)
}
document.write(binaryDeBruijn(10));
There are other numbers besides the even numbers which could be skipped (e.g. the numbers 9 and 13 in the example); if you could predict these numbers, this would of course make the algorithm much more efficient, but I'm not sure there's an obvious pattern there.
来源:https://stackoverflow.com/questions/35370539/longest-binary-sequence-with-no-equal-n-length-subsequences