O(nlogn) Algorithm - Find three evenly spaced ones within binary string

前端未结

关注

 30  2931

I had this question on an Algorithms test yesterday, and I can\'t figure out the answer. It is driving me absolutely crazy, because it was worth about 40 points. I figure

相关标签:

30条回答

梦毁少年i

2020-11-28 00:47

I thought I'd add one comment before posting the 22nd naive solution to the problem. For the naive solution, we don't need to show that the number of 1's in the string is at most O(log(n)), but rather that it is at most O(sqrt(n*log(n)).

Solver:

def solve(Str):
    indexes=[]
    #O(n) setup
    for i in range(len(Str)):
        if Str[i]=='1':
            indexes.append(i)

    #O((number of 1's)^2) processing
    for i in range(len(indexes)):
        for j in range(i+1, len(indexes)):
                            indexDiff = indexes[j] - indexes[i]
            k=indexes[j] + indexDiff
            if k<len(Str) and Str[k]=='1':
                return True
    return False

It's basically a fair bit similar to flybywire's idea and implementation, though looking ahead instead of back.

Greedy String Builder:

#assumes final char hasn't been added, and would be a 1 
def lastCharMakesSolvable(Str):
    endIndex=len(Str)
    j=endIndex-1
    while j-(endIndex-j) >= 0:
        k=j-(endIndex-j)
        if k >= 0 and Str[k]=='1' and Str[j]=='1':
            return True
        j=j-1
    return False



def expandString(StartString=''):
    if lastCharMakesSolvable(StartString):
        return StartString + '0'
    return StartString + '1'

n=1
BaseStr=""
lastCount=0
while n<1000000:
    BaseStr=expandString(BaseStr)
    count=BaseStr.count('1')
    if count != lastCount:
        print(len(BaseStr), count)
    lastCount=count
    n=n+1

(In my defense, I'm still in the 'learn python' stage of understanding)

Also, potentially useful output from the greedy building of strings, there's a rather consistent jump after hitting a power of 2 in the number of 1's... which I was not willing to wait around to witness hitting 2096.

strlength   # of 1's
    1    1
    2    2
    4    3
    5    4
   10    5
   14    8
   28    9
   41    16
   82    17
  122    32
  244    33
  365    64
  730    65
 1094    128
 2188    129
 3281    256
 6562    257
 9842    512
19684    513
29525    1024

0 讨论(0)

伪装坚强ぢ

2020-11-28 00:49

Assumption:

Just wrong, talking about log(n) number of upper limit of ones

EDIT:

Now I found that using Cantor numbers (if correct), density on set is (2/3)^Log_3(n) (what a weird function) and I agree, log(n)/n density is to strong.

If this is upper limit, there is algorhitm who solves this problem in at least O(n*(3/2)^(log(n)/log(3))) time complexity and O((3/2)^(log(n)/log(3))) space complexity. (check Justice's answer for algorhitm)

This is still by far better than O(n^2)

This function ((3/2)^(log(n)/log(3))) really looks like n*log(n) on first sight.

How did I get this formula?

Applaying Cantors number on string.
Supose that length of string is 3^p == n
At each step in generation of Cantor string you keep 2/3 of prevous number of ones. Apply this p times.

That mean (n * ((2/3)^p)) -> (((3^p)) * ((2/3)^p)) remaining ones and after simplification 2^p. This mean 2^p ones in 3^p string -> (3/2)^p ones . Substitute p=log(n)/log(3) and get
((3/2)^(log(n)/log(3)))

0 讨论(0)
发布评论:

提交评论
- 加载中...

长情又很酷

2020-11-28 00:50

I think this algorithm has O(n log n) complexity (C++, DevStudio 2k5). Now, I don't know the details of how to analyse an algorithm to determine its complexity, so I have added some metric gathering information to the code. The code counts the number of tests done on the sequence of 1's and 0's for any given input (hopefully, I've not made a balls of the algorithm). We can compare the actual number of tests against the O value and see if there's a correlation.

#include <iostream>
using namespace std;

bool HasEvenBits (string &sequence, int &num_compares)
{
  bool
    has_even_bits = false;

  num_compares = 0;

  for (unsigned i = 1 ; i <= (sequence.length () - 1) / 2 ; ++i)
  {
    for (unsigned j = 0 ; j < sequence.length () - 2 * i ; ++j)
    {
      ++num_compares;
      if (sequence [j] == '1' && sequence [j + i] == '1' && sequence [j + i * 2] == '1')
      {
        has_even_bits = true;
        // we could 'break' here, but I want to know the worst case scenario so keep going to the end
      }
    }
  }

  return has_even_bits;
}

int main ()
{
  int
    count;

  string
    input = "111";

  for (int i = 3 ; i < 32 ; ++i)
  {
    HasEvenBits (input, count);
    cout << i << ", " << count << endl;
    input += "0";
  }
}

This program outputs the number of tests for each string length up to 32 characters. Here's the results:

 n  Tests  n log (n)
=====================
 3     1     1.43
 4     2     2.41
 5     4     3.49
 6     6     4.67
 7     9     5.92
 8    12     7.22
 9    16     8.59
10    20    10.00
11    25    11.46
12    30    12.95
13    36    14.48
14    42    16.05
15    49    17.64
16    56    19.27
17    64    20.92
18    72    22.59
19    81    24.30
20    90    26.02
21   100    27.77
22   110    29.53
23   121    31.32
24   132    33.13
25   144    34.95
26   156    36.79
27   169    38.65
28   182    40.52
29   196    42.41
30   210    44.31
31   225    46.23

I've added the 'n log n' values as well. Plot these using your graphing tool of choice to see a correlation between the two results. Does this analysis extend to all values of n? I don't know.

0 讨论(0)

悲哀的现实

2020-11-28 00:54
Finally! Following up leads in sdcvvc's answer, we have it: the O(n log n) algorithm for the problem! It is simple too, after you understand it. Those who guessed FFT were right.

The problem: we are given a binary string S of length n, and we want to find three evenly spaced 1s in it. For example, S may be 110110010, where n=9. It has evenly spaced 1s at positions 2, 5, and 8.
1. Scan S left to right, and make a list L of positions of 1. For the S=110110010 above, we have the list L = [1, 2, 4, 5, 8]. This step is O(n). The problem is now to find an arithmetic progression of length 3 in L, i.e. to find distinct a, b, c in L such that b-a = c-b, or equivalently a+c=2b. For the example above, we want to find the progression (2, 5, 8).
2. Make a polynomial p with terms x^k for each k in L. For the example above, we make the polynomial p(x) = (x + x² + x⁴ + x⁵+x⁸). This step is O(n).
3. Find the polynomial q = p², using the Fast Fourier Transform. For the example above, we get the polynomial q(x) = x¹⁶ + 2x¹³ + 2x¹² + 3x¹⁰ + 4x⁹ + x⁸ + 2x⁷ + 4x⁶ + 2x⁵ + x⁴ + 2x³ + x². This step is O(n log n).
4. Ignore all terms except those corresponding to x^2k for some k in L. For the example above, we get the terms x¹⁶, 3x¹⁰, x⁸, x⁴, x². This step is O(n), if you choose to do it at all.
Here's the crucial point: the coefficient of any x^2b for b in L is precisely the number of pairs (a,c) in L such that a+c=2b. [CLRS, Ex. 30.1-7] One such pair is (b,b) always (so the coefficient is at least 1), but if there exists any other pair (a,c), then the coefficient is at least 3, from (a,c) and (c,a). For the example above, we have the coefficient of x¹⁰ to be 3 precisely because of the AP (2,5,8). (These coefficients x^2b will always be odd numbers, for the reasons above. And all other coefficients in q will always be even.)

So then, the algorithm is to look at the coefficients of these terms x^2b, and see if any of them is greater than 1. If there is none, then there are no evenly spaced 1s. If there is a b in L for which the coefficient of x^2b is greater than 1, then we know that there is some pair (a,c) — other than (b,b) — for which a+c=2b. To find the actual pair, we simply try each a in L (the corresponding c would be 2b-a) and see if there is a 1 at position 2b-a in S. This step is O(n).

That's all, folks.

One might ask: do we need to use FFT? Many answers, such as beta's, flybywire's, and rsp's, suggest that the approach that checks each pair of 1s and sees if there is a 1 at the "third" position, might work in O(n log n), based on the intuition that if there are too many 1s, we would find a triple easily, and if there are too few 1s, checking all pairs takes little time. Unfortunately, while this intuition is correct and the simple approach is better than O(n²), it is not significantly better. As in sdcvvc's answer, we can take the "Cantor-like set" of strings of length n=3^k, with 1s at the positions whose ternary representation has only 0s and 2s (no 1s) in it. Such a string has 2^k = n^{(log 2)/(log 3)} ≈ n^0.63 ones in it and no evenly spaced 1s, so checking all pairs would be of the order of the square of the number of 1s in it: that's 4^k ≈ n^1.26 which unfortunately is asymptotically much larger than (n log n). In fact, the worst case is even worse: Leo Moser in 1953 constructed (effectively) such strings which have n^{1-c/√(log n)} 1s in them but no evenly spaced 1s, which means that on such strings, the simple approach would take Θ(n^{2-2c/√(log n)}) — only a tiny bit better than Θ(n²), surprisingly!

About the maximum number of 1s in a string of length n with no 3 evenly spaced ones (which we saw above was at least n^0.63 from the easy Cantor-like construction, and at least n^{1-c/√(log n)} with Moser's construction) — this is OEIS A003002. It can also be calculated directly from OEIS A065825 as the k such that A065825(k) ≤ n < A065825(k+1). I wrote a program to find these, and it turns out that the greedy algorithm does not give the longest such string. For example, for n=9, we can get 5 1s (110100011) but the greedy gives only 4 (110110000), for n=26 we can get 11 1s (11001010001000010110001101) but the greedy gives only 8 (11011000011011000000000000), and for n=74 we can get 22 1s (11000010110001000001011010001000000000000000010001011010000010001101000011) but the greedy gives only 16 (11011000011011000000000000011011000011011000000000000000000000000000000000). They do agree at quite a few places until 50 (e.g. all of 38 to 50), though. As the OEIS references say, it seems that Jaroslaw Wroblewski is interested in this question, and he maintains a website on these non-averaging sets. The exact numbers are known only up to 194.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2020-11-28 00:54
I'll give my rough guess here, and let those who are better with calculating complexity to help me on how my algorithm fares in O-notation wise
1. given binary string 0000010101000100 (as example)
2. crop head and tail of zeroes -> 00000 101010001 00
3. we get 101010001 from previous calculation
4. check if the middle bit is 'one', if true, found valid three evenly spaced 'ones' (only if the number of bits is odd numbered)
5. correlatively, if the remained cropped number of bits is even numbered, the head and tail 'one' cannot be part of evenly spaced 'one',
6. we use 1010100001 as example (with an extra 'zero' to become even numbered crop), in this case we need to crop again, then becomes -> 10101 00001
7. we get 10101 from previous calculation, and check middle bit, and we found the evenly spaced bit again
I have no idea how to calculate complexity for this, can anyone help?

edit: add some code to illustrate my idea

edit2: tried to compile my code and found some major mistakes, fixed
```
char *binaryStr = "0000010101000100";

int main() {
   int head, tail, pos;
   head = 0;
   tail = strlen(binaryStr)-1;
   if( (pos = find3even(head, tail)) >=0 )
      printf("found it at position %d\n", pos);
   return 0;
}

int find3even(int head, int tail) {
   int pos = 0;
   if(head >= tail) return -1;
   while(binaryStr[head] == '0') 
      if(head<tail) head++;
   while(binaryStr[tail] == '0') 
      if(head<tail) tail--;
   if(head >= tail) return -1;
   if( (tail-head)%2 == 0 && //true if odd numbered
       (binaryStr[head + (tail-head)/2] == '1') ) { 
         return head;
   }else {
      if( (pos = find3even(head, tail-1)) >=0 )
         return pos;
      if( (pos = find3even(head+1, tail)) >=0 )
         return pos;
   }
   return -1;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
小蘑菇

2020-11-28 00:55
One inroad into the problem is to think of factors and shifting.

With shifting, you compare the string of ones and zeroes with a shifted version of itself. You then take matching ones. Take this example shifted by two:
```
1010101010
  1010101010
------------
001010101000
```
The resulting 1's (bitwise ANDed), must represent all those 1's which are evenly spaced by two. The same example shifted by three:
```
1010101010
   1010101010
-------------
0000000000000
```
In this case there are no 1's which are evenly spaced three apart.

So what does this tell you? Well that you only need to test shifts which are prime numbers. For example say you have two 1's which are six apart. You would only have to test 'two' shifts and 'three' shifts (since these divide six). For example:
```
10000010 
  10000010 (Shift by two)
    10000010
      10000010 (We have a match)

10000010
   10000010 (Shift by three)
      10000010 (We have a match)
```
So the only shifts you ever need to check are 2,3,5,7,11,13 etc. Up to the prime closest to the square root of size of the string of digits.

Nearly solved?

I think I am closer to a solution. Basically:
1. Scan the string for 1's. For each 1 note it's remainder after taking a modulus of its position. The modulus ranges from 1 to half the size of the string. This is because the largest possible separation size is half the string. This is done in O(n^2). BUT. Only prime moduli need be checked so O(n^2/log(n))
2. Sort the list of modulus/remainders in order largest modulus first, this can be done in O(n*log(n)) time.
3. Look for three consecutive moduli/remainders which are the same.
4. Somehow retrieve the position of the ones!
I think the biggest clue to the answer, is that the fastest sort algorithms, are O(n*log(n)).

WRONG

Step 1 is wrong as pointed out by a colleague. If we have 1's at position 2,12 and 102. Then taking a modulus of 10, they would all have the same remainders, and yet are not equally spaced apart! Sorry.
0 讨论(0)
发布评论:

提交评论
- 加载中...