O(nlogn) Algorithm - Find three evenly spaced ones within binary string

前端 未结 30 2931
刺人心
刺人心 2020-11-28 00:07

I had this question on an Algorithms test yesterday, and I can\'t figure out the answer. It is driving me absolutely crazy, because it was worth about 40 points. I figure

相关标签:
30条回答
  • 2020-11-28 00:47

    I thought I'd add one comment before posting the 22nd naive solution to the problem. For the naive solution, we don't need to show that the number of 1's in the string is at most O(log(n)), but rather that it is at most O(sqrt(n*log(n)).

    Solver:

    def solve(Str):
        indexes=[]
        #O(n) setup
        for i in range(len(Str)):
            if Str[i]=='1':
                indexes.append(i)
    
        #O((number of 1's)^2) processing
        for i in range(len(indexes)):
            for j in range(i+1, len(indexes)):
                                indexDiff = indexes[j] - indexes[i]
                k=indexes[j] + indexDiff
                if k<len(Str) and Str[k]=='1':
                    return True
        return False
    

    It's basically a fair bit similar to flybywire's idea and implementation, though looking ahead instead of back.

    Greedy String Builder:

    #assumes final char hasn't been added, and would be a 1 
    def lastCharMakesSolvable(Str):
        endIndex=len(Str)
        j=endIndex-1
        while j-(endIndex-j) >= 0:
            k=j-(endIndex-j)
            if k >= 0 and Str[k]=='1' and Str[j]=='1':
                return True
            j=j-1
        return False
    
    
    
    def expandString(StartString=''):
        if lastCharMakesSolvable(StartString):
            return StartString + '0'
        return StartString + '1'
    
    n=1
    BaseStr=""
    lastCount=0
    while n<1000000:
        BaseStr=expandString(BaseStr)
        count=BaseStr.count('1')
        if count != lastCount:
            print(len(BaseStr), count)
        lastCount=count
        n=n+1
    

    (In my defense, I'm still in the 'learn python' stage of understanding)

    Also, potentially useful output from the greedy building of strings, there's a rather consistent jump after hitting a power of 2 in the number of 1's... which I was not willing to wait around to witness hitting 2096.

    strlength   # of 1's
        1    1
        2    2
        4    3
        5    4
       10    5
       14    8
       28    9
       41    16
       82    17
      122    32
      244    33
      365    64
      730    65
     1094    128
     2188    129
     3281    256
     6562    257
     9842    512
    19684    513
    29525    1024
    
    0 讨论(0)
  • 2020-11-28 00:49

    Assumption:

    Just wrong, talking about log(n) number of upper limit of ones

    EDIT:

    Now I found that using Cantor numbers (if correct), density on set is (2/3)^Log_3(n) (what a weird function) and I agree, log(n)/n density is to strong.

    If this is upper limit, there is algorhitm who solves this problem in at least O(n*(3/2)^(log(n)/log(3))) time complexity and O((3/2)^(log(n)/log(3))) space complexity. (check Justice's answer for algorhitm)

    This is still by far better than O(n^2)

    This function ((3/2)^(log(n)/log(3))) really looks like n*log(n) on first sight.

    How did I get this formula?

    Applaying Cantors number on string.
    Supose that length of string is 3^p == n
    At each step in generation of Cantor string you keep 2/3 of prevous number of ones. Apply this p times.

    That mean (n * ((2/3)^p)) -> (((3^p)) * ((2/3)^p)) remaining ones and after simplification 2^p. This mean 2^p ones in 3^p string -> (3/2)^p ones . Substitute p=log(n)/log(3) and get
    ((3/2)^(log(n)/log(3)))

    0 讨论(0)
  • 2020-11-28 00:50

    I think this algorithm has O(n log n) complexity (C++, DevStudio 2k5). Now, I don't know the details of how to analyse an algorithm to determine its complexity, so I have added some metric gathering information to the code. The code counts the number of tests done on the sequence of 1's and 0's for any given input (hopefully, I've not made a balls of the algorithm). We can compare the actual number of tests against the O value and see if there's a correlation.

    #include <iostream>
    using namespace std;
    
    bool HasEvenBits (string &sequence, int &num_compares)
    {
      bool
        has_even_bits = false;
    
      num_compares = 0;
    
      for (unsigned i = 1 ; i <= (sequence.length () - 1) / 2 ; ++i)
      {
        for (unsigned j = 0 ; j < sequence.length () - 2 * i ; ++j)
        {
          ++num_compares;
          if (sequence [j] == '1' && sequence [j + i] == '1' && sequence [j + i * 2] == '1')
          {
            has_even_bits = true;
            // we could 'break' here, but I want to know the worst case scenario so keep going to the end
          }
        }
      }
    
      return has_even_bits;
    }
    
    int main ()
    {
      int
        count;
    
      string
        input = "111";
    
      for (int i = 3 ; i < 32 ; ++i)
      {
        HasEvenBits (input, count);
        cout << i << ", " << count << endl;
        input += "0";
      }
    }
    

    This program outputs the number of tests for each string length up to 32 characters. Here's the results:

     n  Tests  n log (n)
    =====================
     3     1     1.43
     4     2     2.41
     5     4     3.49
     6     6     4.67
     7     9     5.92
     8    12     7.22
     9    16     8.59
    10    20    10.00
    11    25    11.46
    12    30    12.95
    13    36    14.48
    14    42    16.05
    15    49    17.64
    16    56    19.27
    17    64    20.92
    18    72    22.59
    19    81    24.30
    20    90    26.02
    21   100    27.77
    22   110    29.53
    23   121    31.32
    24   132    33.13
    25   144    34.95
    26   156    36.79
    27   169    38.65
    28   182    40.52
    29   196    42.41
    30   210    44.31
    31   225    46.23
    

    I've added the 'n log n' values as well. Plot these using your graphing tool of choice to see a correlation between the two results. Does this analysis extend to all values of n? I don't know.

    0 讨论(0)
  • 2020-11-28 00:54

    Finally! Following up leads in sdcvvc's answer, we have it: the O(n log n) algorithm for the problem! It is simple too, after you understand it. Those who guessed FFT were right.

    The problem: we are given a binary string S of length n, and we want to find three evenly spaced 1s in it. For example, S may be 110110010, where n=9. It has evenly spaced 1s at positions 2, 5, and 8.

    1. Scan S left to right, and make a list L of positions of 1. For the S=110110010 above, we have the list L = [1, 2, 4, 5, 8]. This step is O(n). The problem is now to find an arithmetic progression of length 3 in L, i.e. to find distinct a, b, c in L such that b-a = c-b, or equivalently a+c=2b. For the example above, we want to find the progression (2, 5, 8).

    2. Make a polynomial p with terms xk for each k in L. For the example above, we make the polynomial p(x) = (x + x2 + x4 + x5+x8). This step is O(n).

    3. Find the polynomial q = p2, using the Fast Fourier Transform. For the example above, we get the polynomial q(x) = x16 + 2x13 + 2x12 + 3x10 + 4x9 + x8 + 2x7 + 4x6 + 2x5 + x4 + 2x3 + x2. This step is O(n log n).

    4. Ignore all terms except those corresponding to x2k for some k in L. For the example above, we get the terms x16, 3x10, x8, x4, x2. This step is O(n), if you choose to do it at all.

    Here's the crucial point: the coefficient of any x2b for b in L is precisely the number of pairs (a,c) in L such that a+c=2b. [CLRS, Ex. 30.1-7] One such pair is (b,b) always (so the coefficient is at least 1), but if there exists any other pair (a,c), then the coefficient is at least 3, from (a,c) and (c,a). For the example above, we have the coefficient of x10 to be 3 precisely because of the AP (2,5,8). (These coefficients x2b will always be odd numbers, for the reasons above. And all other coefficients in q will always be even.)

    So then, the algorithm is to look at the coefficients of these terms x2b, and see if any of them is greater than 1. If there is none, then there are no evenly spaced 1s. If there is a b in L for which the coefficient of x2b is greater than 1, then we know that there is some pair (a,c) — other than (b,b) — for which a+c=2b. To find the actual pair, we simply try each a in L (the corresponding c would be 2b-a) and see if there is a 1 at position 2b-a in S. This step is O(n).

    That's all, folks.


    One might ask: do we need to use FFT? Many answers, such as beta's, flybywire's, and rsp's, suggest that the approach that checks each pair of 1s and sees if there is a 1 at the "third" position, might work in O(n log n), based on the intuition that if there are too many 1s, we would find a triple easily, and if there are too few 1s, checking all pairs takes little time. Unfortunately, while this intuition is correct and the simple approach is better than O(n2), it is not significantly better. As in sdcvvc's answer, we can take the "Cantor-like set" of strings of length n=3k, with 1s at the positions whose ternary representation has only 0s and 2s (no 1s) in it. Such a string has 2k = n(log 2)/(log 3) ≈ n0.63 ones in it and no evenly spaced 1s, so checking all pairs would be of the order of the square of the number of 1s in it: that's 4k ≈ n1.26 which unfortunately is asymptotically much larger than (n log n). In fact, the worst case is even worse: Leo Moser in 1953 constructed (effectively) such strings which have n1-c/√(log n) 1s in them but no evenly spaced 1s, which means that on such strings, the simple approach would take Θ(n2-2c/√(log n)) — only a tiny bit better than Θ(n2), surprisingly!


    About the maximum number of 1s in a string of length n with no 3 evenly spaced ones (which we saw above was at least n0.63 from the easy Cantor-like construction, and at least n1-c/√(log n) with Moser's construction) — this is OEIS A003002. It can also be calculated directly from OEIS A065825 as the k such that A065825(k) ≤ n < A065825(k+1). I wrote a program to find these, and it turns out that the greedy algorithm does not give the longest such string. For example, for n=9, we can get 5 1s (110100011) but the greedy gives only 4 (110110000), for n=26 we can get 11 1s (11001010001000010110001101) but the greedy gives only 8 (11011000011011000000000000), and for n=74 we can get 22 1s (11000010110001000001011010001000000000000000010001011010000010001101000011) but the greedy gives only 16 (11011000011011000000000000011011000011011000000000000000000000000000000000). They do agree at quite a few places until 50 (e.g. all of 38 to 50), though. As the OEIS references say, it seems that Jaroslaw Wroblewski is interested in this question, and he maintains a website on these non-averaging sets. The exact numbers are known only up to 194.

    0 讨论(0)
  • 2020-11-28 00:54

    I'll give my rough guess here, and let those who are better with calculating complexity to help me on how my algorithm fares in O-notation wise

    1. given binary string 0000010101000100 (as example)
    2. crop head and tail of zeroes -> 00000 101010001 00
    3. we get 101010001 from previous calculation
    4. check if the middle bit is 'one', if true, found valid three evenly spaced 'ones' (only if the number of bits is odd numbered)
    5. correlatively, if the remained cropped number of bits is even numbered, the head and tail 'one' cannot be part of evenly spaced 'one',
    6. we use 1010100001 as example (with an extra 'zero' to become even numbered crop), in this case we need to crop again, then becomes -> 10101 00001
    7. we get 10101 from previous calculation, and check middle bit, and we found the evenly spaced bit again

    I have no idea how to calculate complexity for this, can anyone help?

    edit: add some code to illustrate my idea

    edit2: tried to compile my code and found some major mistakes, fixed

    char *binaryStr = "0000010101000100";
    
    int main() {
       int head, tail, pos;
       head = 0;
       tail = strlen(binaryStr)-1;
       if( (pos = find3even(head, tail)) >=0 )
          printf("found it at position %d\n", pos);
       return 0;
    }
    
    int find3even(int head, int tail) {
       int pos = 0;
       if(head >= tail) return -1;
       while(binaryStr[head] == '0') 
          if(head<tail) head++;
       while(binaryStr[tail] == '0') 
          if(head<tail) tail--;
       if(head >= tail) return -1;
       if( (tail-head)%2 == 0 && //true if odd numbered
           (binaryStr[head + (tail-head)/2] == '1') ) { 
             return head;
       }else {
          if( (pos = find3even(head, tail-1)) >=0 )
             return pos;
          if( (pos = find3even(head+1, tail)) >=0 )
             return pos;
       }
       return -1;
    }
    
    0 讨论(0)
  • 2020-11-28 00:55

    One inroad into the problem is to think of factors and shifting.

    With shifting, you compare the string of ones and zeroes with a shifted version of itself. You then take matching ones. Take this example shifted by two:

    1010101010
      1010101010
    ------------
    001010101000
    

    The resulting 1's (bitwise ANDed), must represent all those 1's which are evenly spaced by two. The same example shifted by three:

    1010101010
       1010101010
    -------------
    0000000000000
    

    In this case there are no 1's which are evenly spaced three apart.

    So what does this tell you? Well that you only need to test shifts which are prime numbers. For example say you have two 1's which are six apart. You would only have to test 'two' shifts and 'three' shifts (since these divide six). For example:

    10000010 
      10000010 (Shift by two)
        10000010
          10000010 (We have a match)
    
    10000010
       10000010 (Shift by three)
          10000010 (We have a match)
    

    So the only shifts you ever need to check are 2,3,5,7,11,13 etc. Up to the prime closest to the square root of size of the string of digits.

    Nearly solved?

    I think I am closer to a solution. Basically:

    1. Scan the string for 1's. For each 1 note it's remainder after taking a modulus of its position. The modulus ranges from 1 to half the size of the string. This is because the largest possible separation size is half the string. This is done in O(n^2). BUT. Only prime moduli need be checked so O(n^2/log(n))
    2. Sort the list of modulus/remainders in order largest modulus first, this can be done in O(n*log(n)) time.
    3. Look for three consecutive moduli/remainders which are the same.
    4. Somehow retrieve the position of the ones!

    I think the biggest clue to the answer, is that the fastest sort algorithms, are O(n*log(n)).

    WRONG

    Step 1 is wrong as pointed out by a colleague. If we have 1's at position 2,12 and 102. Then taking a modulus of 10, they would all have the same remainders, and yet are not equally spaced apart! Sorry.

    0 讨论(0)
提交回复
热议问题