Easy interview question got harder: given numbers 1..100, find the missing number(s) given exactly k are missing

前端未结
关注
 30  2008
时光说笑 2020-11-22 07:02
I had an interesting job interview experience a while back. The question started really easy:
Q1: We have a bag containing numbers

      
      
        
          30条回答        

        
                    
            
            
                         
                
              
              
                
                   清歌不尽
                                             
                
                
                (楼主)
            
              
              
                2020-11-22 08:00
              

            
            
                        
There is a general way to generalize streaming algorithms like this.
The idea is to use a bit of randomization to hopefully 'spread' the k elements into independent sub problems, where our original algorithm solves the problem for us. This technique is used in sparse signal reconstruction, among other things.


Make an array, a, of size u = k^2.
Pick any universal hash function, h : {1,...,n} -> {1,...,u}. (Like multiply-shift)
For each i in 1, ..., n increase a[h(i)] += i
For each number x in the input stream, decrement a[h(x)] -= x.


If all of the missing  numbers have been hashed to different buckets, the non-zero elements of the array will now contain the missing numbers.

The probability that a particular pair is sent to the same bucket, is less than 1/u by definition of a universal hash function. Since there are about k^2/2 pairs, we have that the error probability is at most k^2/2/u=1/2. That is, we succeed with probability at least 50%, and if we increase u we increase our chances.

Notice that this algorithm takes k^2 logn bits of space (We need logn bits per array bucket.) This matches the space required by @Dimitris Andreou's answer (In particular the space requirement of polynomial factorization, which happens to also be randomized.) 
This algorithm also has constant time per update, rather than time k in the case of power-sums.

In fact, we can be even more efficient than the power sum method by using the trick described in the comments.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它30个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复