I had an interesting job interview experience a while back. The question started really easy:
Q1: We have a bag containing numbers
To solve the 2 (and 3) missing numbers question, you can modify quickselect, which on average runs in O(n)
and uses constant memory if partitioning is done in-place.
Partition the set with respect to a random pivot p
into partitions l
, which contain numbers smaller than the pivot, and r
, which contain numbers greater than the pivot.
Determine which partitions the 2 missing numbers are in by comparing the pivot value to the size of each partition (p - 1 - count(l) = count of missing numbers in l
and
n - count(r) - p = count of missing numbers in r
)
a) If each partition is missing one number, then use the difference of sums approach to find each missing number.
(1 + 2 + ... + (p-1)) - sum(l) = missing #1
and
((p+1) + (p+2) ... + n) - sum(r) = missing #2
b) If one partition is missing both numbers and the partition is empty, then the missing numbers are either (p-1,p-2)
or (p+1,p+2)
depending on which partition is missing the numbers.
If one partition is missing 2 numbers but is not empty, then recurse onto that partiton.
With only 2 missing numbers, this algorithm always discards at least one partition, so it retains O(n)
average time complexity of quickselect. Similarly, with 3 missing numbers this algorithm also discards at least one partition with each pass (because as with 2 missing numbers, at most only 1 partition will contain multiple missing numbers). However, I'm not sure how much the performance decreases when more missing numbers are added.
Here's an implementation that does not use in-place partitioning, so this example does not meet the space requirement but it does illustrate the steps of the algorithm:
Demo