O(nlogn) Algorithm - Find three evenly spaced ones within binary string

前端 未结 30 2930
刺人心
刺人心 2020-11-28 00:07

I had this question on an Algorithms test yesterday, and I can\'t figure out the answer. It is driving me absolutely crazy, because it was worth about 40 points. I figure

相关标签:
30条回答
  • 2020-11-28 00:33

    This seemed liked a fun problem so I decided to try my hand at it.

    I am making the assumption that 111000001 would find the first 3 ones and be successful. Essentially the number of zeroes following the 1 is the important thing, since 0111000 is the same as 111000 according to your definition. Once you find two cases of 1, the next 1 found completes the trilogy.

    Here it is in Python:

    def find_three(bstring):
        print bstring
        dict = {}
        lastone = -1
        zerocount = 0
        for i in range(len(bstring)):
            if bstring[i] == '1':
                print i, ': 1'
                if lastone != -1:
                    if(zerocount in dict):
                        dict[zerocount].append(lastone)
                        if len(dict[zerocount]) == 2:
                            dict[zerocount].append(i)
                            return True, dict
                    else:
                        dict[zerocount] = [lastone]
                lastone = i
                zerocount = 0
            else:
                zerocount = zerocount + 1
        #this is really just book keeping, as we have failed at this point
        if lastone != -1:
            if(zerocount in dict):
                dict[zerocount].append(lastone)
            else:
                dict[zerocount] = [lastone]
        return False, dict
    

    This is a first try, so I'm sure this could be written in a cleaner manner. Please list the cases where this method fails down below.

    0 讨论(0)
  • 2020-11-28 00:34

    Here are some thoughts that, despite my best efforts, will not seem to wrap themselves up in a bow. Still, they might be a useful starting point for someone's analysis.

    Consider the proposed solution as follows, which is the approach that several folks have suggested, including myself in a prior version of this answer. :)

    1. Trim leading and trailing zeroes.
    2. Scan the string looking for 1's.
    3. When a 1 is found:
      1. Assume that it is the middle 1 of the solution.
      2. For each prior 1, use its saved position to compute the anticipated position of the final 1.
      3. If the computed position is after the end of the string it cannot be part of the solution, so drop the position from the list of candidates.
      4. Check the solution.
    4. If the solution was not found, add the current 1 to the list of candidates.
    5. Repeat until no more 1's are found.

    Now consider input strings strings like the following, which will not have a solution:

    101
    101001
    1010010001
    101001000100001
    101001000100001000001
    

    In general, this is the concatenation of k strings of the form j 0's followed by a 1 for j from zero to k-1.

    k=2  101
    k=3  101001
    k=4  1010010001
    k=5  101001000100001
    k=6  101001000100001000001
    

    Note that the lengths of the substrings are 1, 2, 3, etc. So, problem size n has substrings of lengths 1 to k such that n = k(k+1)/2.

    k=2  n= 3  101
    k=3  n= 6  101001
    k=4  n=10  1010010001
    k=5  n=15  101001000100001
    k=6  n=21  101001000100001000001
    

    Note that k also tracks the number of 1's that we have to consider. Remember that every time we see a 1, we need to consider all the 1's seen so far. So when we see the second 1, we only consider the first, when we see the third 1, we reconsider the first two, when we see the fourth 1, we need to reconsider the first three, and so on. By the end of the algorithm, we've considered k(k-1)/2 pairs of 1's. Call that p.

    k=2  n= 3  p= 1  101
    k=3  n= 6  p= 3  101001
    k=4  n=10  p= 6  1010010001
    k=5  n=15  p=10  101001000100001
    k=6  n=21  p=15  101001000100001000001
    

    The relationship between n and p is that n = p + k.

    The process of going through the string takes O(n) time. Each time a 1 is encountered, a maximum of (k-1) comparisons are done. Since n = k(k+1)/2, n > k**2, so sqrt(n) > k. This gives us O(n sqrt(n)) or O(n**3/2). Note however that may not be a really tight bound, because the number of comparisons goes from 1 to a maximum of k, it isn't k the whole time. But I'm not sure how to account for that in the math.

    It still isn't O(n log(n)). Also, I can't prove those inputs are the worst cases, although I suspect they are. I think a denser packing of 1's to the front results in an even sparser packing at the end.

    Since someone may still find it useful, here's my code for that solution in Perl:

    #!/usr/bin/perl
    
    # read input as first argument
    my $s = $ARGV[0];
    
    # validate the input
    $s =~ /^[01]+$/ or die "invalid input string\n";
    
    # strip leading and trailing 0's
    $s =~ s/^0+//;
    $s =~ s/0+$//;
    
    # prime the position list with the first '1' at position 0
    my @p = (0);
    
    # start at position 1, which is the second character
    my $i = 1;
    
    print "the string is $s\n\n";
    
    while ($i < length($s)) {
       if (substr($s, $i, 1) eq '1') {
          print "found '1' at position $i\n";
          my @t = ();
          # assuming this is the middle '1', go through the positions
          # of all the prior '1's and check whether there's another '1'
          # in the correct position after this '1' to make a solution
          while (scalar @p) {
             # $p is the position of the prior '1'
             my $p = shift @p;
             # $j is the corresponding position for the following '1'
             my $j = 2 * $i - $p;
             # if $j is off the end of the string then we don't need to
             # check $p anymore
             next if ($j >= length($s));
             print "checking positions $p, $i, $j\n";
             if (substr($s, $j, 1) eq '1') {
                print "\nsolution found at positions $p, $i, $j\n";
                exit 0;
             }
             # if $j isn't off the end of the string, keep $p for next time
             push @t, $p;
          }
          @p = @t;
          # add this '1' to the list of '1' positions
          push @p, $i;
       }
       $i++;
    }
    
    print "\nno solution found\n";
    
    0 讨论(0)
  • 2020-11-28 00:35

    Below is a solution. There could be some little mistakes here and there, but the idea is sound.

    Edit: It's not n * log(n)

    PSEUDO CODE:

    foreach character in the string
      if the character equals 1 {         
         if length cache > 0 { //we can skip the first one
            foreach location in the cache { //last in first out kind of order
               if ((currentlocation + (currentlocation - location)) < length string)
                  if (string[(currentlocation + (currentlocation - location))] equals 1)
                     return found evenly spaced string
               else
                  break;
            }
         }
         remember the location of this character in a some sort of cache.
      }
    
    return didn't find evenly spaced string
    

    C# code:

    public static Boolean FindThreeEvenlySpacedOnes(String str) {
        List<int> cache = new List<int>();
    
        for (var x = 0; x < str.Length; x++) {
            if (str[x] == '1') {
                if (cache.Count > 0) {
                    for (var i = cache.Count - 1; i > 0; i--) {
                        if ((x + (x - cache[i])) >= str.Length)
                            break;
    
                        if (str[(x + (x - cache[i]))] == '1')
                            return true;                            
                    }
                }
                cache.Add(x);                    
            }
        }
    
        return false;
    }
    

    How it works:

    iteration 1:
    x
    |
    101101001
    // the location of this 1 is stored in the cache
    
    iteration 2:
     x
     | 
    101101001
    
    iteration 3:
    a x b 
    | | | 
    101101001
    //we retrieve location a out of the cache and then based on a 
    //we calculate b and check if te string contains a 1 on location b
    
    //and of course we store x in the cache because it's a 1
    
    iteration 4:
      axb  
      |||  
    101101001
    
    a  x  b  
    |  |  |  
    101101001
    
    
    iteration 5:
        x  
        |  
    101101001
    
    iteration 6:
       a x b 
       | | | 
    101101001
    
      a  x  b 
      |  |  | 
    101101001
    //return found evenly spaced string
    
    0 讨论(0)
  • 2020-11-28 00:36

    Your problem is called AVERAGE in this paper (1999):

    A problem is 3SUM-hard if there is a sub-quadratic reduction from the problem 3SUM: Given a set A of n integers, are there elements a,b,c in A such that a+b+c = 0? It is not known whether AVERAGE is 3SUM-hard. However, there is a simple linear-time reduction from AVERAGE to 3SUM, whose description we omit.

    Wikipedia:

    When the integers are in the range [−u ... u], 3SUM can be solved in time O(n + u lg u) by representing S as a bit vector and performing a convolution using FFT.

    This is enough to solve your problem :).

    What is very important is that O(n log n) is complexity in terms of number of zeroes and ones, not the count of ones (which could be given as an array, like [1,5,9,15]). Checking if a set has an arithmetic progression, terms of number of 1's, is hard, and according to that paper as of 1999 no faster algorithm than O(n2) is known, and is conjectured that it doesn't exist. Everybody who doesn't take this into account is attempting to solve an open problem.

    Other interesting info, mostly irrevelant:

    Lower bound:

    An easy lower bound is Cantor-like set (numbers 1..3^n-1 not containing 1 in their ternary expansion) - its density is n^(log_3 2) (circa 0.631). So any checking if the set isn't too large, and then checking all pairs is not enough to get O(n log n). You have to investigate the sequence smarter. A better lower bound is quoted here - it's n1-c/(log(n))^(1/2). This means Cantor set is not optimal.

    Upper bound - my old algorithm:

    It is known that for large n, a subset of {1,2,...,n} not containing arithmetic progression has at most n/(log n)^(1/20) elements. The paper On triples in arithmetic progression proves more: the set cannot contain more than n * 228 * (log log n / log n)1/2 elements. So you could check if that bound is achieved and if not, naively check pairs. This is O(n2 * log log n / log n) algorithm, faster than O(n2). Unfortunately "On triples..." is on Springer - but the first page is available, and Ben Green's exposition is available here, page 28, theorem 24.

    By the way, the papers are from 1999 - the same year as the first one I mentioned, so that's probably why the first one doesn't mention that result.

    0 讨论(0)
  • 2020-11-28 00:36

    I assume the reason this is nlog(n) is due to the following:

    • To find the 1 that is the start of the triplet, you need to check (n-2) characters. If you haven't found it by that point, you won't (chars n-1 and n cannot start a triplet) (O(n))
    • To find the second 1 that is the part of the triplet (started by the first one), you need to check m/2 (m=n-x, where x is the offset of the first 1) characters. This is because, if you haven't found the second 1 by the time you're halfway from the first one to the end, you won't... since the third 1 must be exactly the same distance past the second. (O(log(n)))
    • It O(1) to find the last 1 since you know the index it must be at by the time you find the first and second.

    So, you have n, log(n), and 1... O(nlogn)

    Edit: Oops, my bad. My brain had it set that n/2 was logn... which it obviously isn't (doubling the number on items still doubles the number of iterations on the inner loop). This is still at n^2, not solving the problem. Well, at least I got to write some code :)


    Implementation in Tcl

    proc get-triplet {input} {
        for {set first 0} {$first < [string length $input]-2} {incr first} {
            if {[string index $input $first] != 1} {
                continue
            }
            set start [expr {$first + 1}]
            set end [expr {1+ $first + (([string length $input] - $first) /2)}]
            for {set second $start} {$second < $end} {incr second} {
                if {[string index $input $second] != 1} {
                    continue
                }
                set last [expr {($second - $first) + $second}]
                if {[string index $input $last] == 1} {
                    return [list $first $second $last]
                }
            }
        }
        return {}
    }
    
    get-triplet 10101      ;# 0 2 4
    get-triplet 10111      ;# 0 2 4
    get-triplet 11100000   ;# 0 1 2
    get-triplet 0100100100 ;# 1 4 7
    
    0 讨论(0)
  • 2020-11-28 00:37

    I'll try to present a mathematical approach. This is more a beginning than an end, so any help, comment, or even contradiction - will be deeply appreciated. However, if this approach is proven - the algorithm is a straight-forward search in the string.

    1. Given a fixed number of spaces k and a string S, the search for a k-spaced-triplet takes O(n) - We simply test for every 0<=i<=(n-2k) if S[i]==S[i+k]==S[i+2k]. The test takes O(1) and we do it n-k times where k is a constant, so it takes O(n-k)=O(n).

    2. Let us assume that there is an Inverse Proportion between the number of 1's and the maximum spaces we need to search for. That is, If there are many 1's, there must be a triplet and it must be quite dense; If there are only few 1's, The triplet (if any) can be quite sparse. In other words, I can prove that if I have enough 1's, such triplet must exist - and the more 1's I have, a more dense triplet must be found. This can be explained by the Pigeonhole principle - Hope to elaborate on this later.

    3. Say have an upper bound k on the possible number of spaces I have to look for. Now, for each 1 located in S[i] we need to check for 1 in S[i-1] and S[i+1], S[i-2] and S[i+2], ... S[i-k] and S[i+k]. This takes O((k^2-k)/2)=O(k^2) for each 1 in S - due to Gauss' Series Summation Formula. Note that this differs from section 1 - I'm having k as an upper bound for the number of spaces, not as a constant space.

    We need to prove O(n*log(n)). That is, we need to show that k*(number of 1's) is proportional to log(n).

    If we can do that, the algorithm is trivial - for each 1 in S whose index is i, simply look for 1's from each side up to distance k. If two were found in the same distance, return i and k. Again, the tricky part would be finding k and proving the correctness.

    I would really appreciate your comments here - I have been trying to find the relation between k and the number of 1's on my whiteboard, so far without success.

    0 讨论(0)
提交回复
热议问题