Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

前端未结
关注
 30  1767
时光说笑 2020-11-22 07:02
I had an interesting job interview experience a while back. The question started really easy:
Q1: We have a bag containing numbers

      
      
        
          30条回答        

        
                    
            
            
                         
                
              
              
                
                   耶瑟儿～
                                             
                
                
                (楼主)
            
              
              
                2020-11-22 07:49
              

            
            
                        

Motivation
If you want to solve the general-case problem, and you can store and edit the array, then Caf's solution is by far the most efficient. If you can't store the array (streaming version), then sdcvvc's answer is the only type of solution currently suggested.
The solution I propose is the most efficient answer (so far on this thread) if you can store the problem but can't edit it, and I got the idea from Svalorzen's solution, which solves for 1 or 2 missing items. This solution takes Θ(k*n) time and O(k) and Ω(log(k)) space - with a possibility that it might actually be O(min(k,log(n))) space. It also works well with parallelism.
Concept
The idea is that if you use the original approach of comparing sums:

int sum = SumOf(1,n) - SumOf(array)
... then you take the average of the missing numbers:

average = sum/array_size
... which provides a boundary: Of the missing numbers, there's guaranteed to be at least one number less-or-equal to average, and at least one number greater than average. This means that we can split into sub problems that each scan the array [O(n)] and are only concerned with their respective sub-arrays.
Code
C-style solution (don't judge me for the global variables, I'm just trying to make the code readable for non-c folks):
#include "stdio.h"

// Example problem:
const int array [] = {0, 7, 3, 1, 5};
const int N = 8; // size of original array
const int array_size = 5;

int SumOneTo (int n)
{
    return n*(n-1)/2; // non-inclusive
}

int MissingItems (const int begin, const int end, int & average)
{
    // We consider only sub-array where elements, e:
    // begin <= e < end
    
    // Initialise info about missing elements.
    // First assume all are missing:
    int n = end - begin;
    int sum = SumOneTo(end) - SumOneTo(begin);

    // Minus everything that we see (ie not missing):
    for (int i = 0; i < array_size; ++i)
    {
        if ((begin <= array[i]) && (array[i] < end))
        {
            n -= 1;
            sum -= array[i];
        }
    }
    
    // used by caller:
    average = sum/n;
    return n;
}

void Find (const int begin, const int end)
{
    int average;

    if (MissingItems(begin, end, average) == 1)
    {
        printf(" %d", average); // average(n) is same as n
        return;
    }
    
    Find(begin, average + 1); // at least one missing here
    Find(average + 1, end); // at least one here also
}

int main ()
{   
    printf("Missing items:");
    
    Find(0, N);
    
    printf("\n");
}

Analysis
Ignoring recursion for a moment, each function call clearly takes O(n) time and O(1) space. Note that sum can equal as much as n(n-1)/2, so requires double the amount of bits needed to store n-1. At most this means than we effectively need two extra elements worth of space, regardless of the size of the array or k, hence it's still O(1) space under the normal conventions.
It's not so obvious how many function calls there are for k missing elements, so I'll provide a visual. Your original sub-array (connected array) is the full array, which has all k missing elements in it. We'll imagine them in increasing order, where -- represent connections (part of same sub-array):
m₁ -- m₂ -- m₃ -- m₄ -- (...) -- m_k-1 -- m_k
The effect of the Find function is to disconnect the missing elements into different non-overlapping sub-arrays. It guarantees that there's at least one missing element in each sub-array, which means breaking exactly one connection.
What this means is that regardless of how the splits occur, it will always take k-1 Find function calls to do the work of finding the sub-arrays that have only one missing element in it.
So the time complexity is Θ((k-1 + k) * n) = Θ(k*n).
For the space complexity, if we divide proportionally each time then we get O(log(k)) space complexity, but if we only separate one at a time it gives us O(k).
Discussion
I actually suspect we the space complexity is a smaller O(min(k,log(n))), but it's harder to prove. My intuition: Where the average performs badly at separation is when there's an outlier, but because of this the separation then removes that outlier. In normal arrays, elements could all be exponentially different, but in this case they're all bound by n.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它30个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复