Distinguishing extra element from two arrays?

问题

One of my friend was asked this question in an interview -

You have given two integer arrays each of size 10.
Both contains 9 equal elements (say 1 to 9)
Only one element is different.

How will you find the different element? What are different approaches you can take?

One simple but lengthy approach would be - sort both arrays,go on comparing each element,on false comparison, you'll get your result.

So what are different approaches for this? Specify logic as its expected in an interview. Not expecting a particular code in a specific language. Pseudo code will suffice.

(Please submit one approach per answer)

My purpose of asking this question is, its OK when array sizes are small.But when array size increases, you must think of a very efficient n faster way.Its never desirable to use comparisons in such case.

回答1:

If you need this to scale, then I would use one of the many Set implementations in the world. For example, Java's HashSet.

Throw all of the first array in the Set. Then, for each member of the second array, if it is contained in the Set, remove it; otherwise mark it as Unique #2. After this procedure, the last remaining member of the Set is Unique #1.

I'd probably do it this way, even on an interview, and even for simple ten-element arrays. Life is too short to spend trying to find the clever way to scale a wall when there's a perfectly good door in it.

回答2:

Here is a mathematical approach, inspired by Kevin's answer and its comments.

Let's call the arrays A and B and let their unique elements be a and b, respectively. First, take the sums of both arrays and subtract one from the other; since everything else cancels, sum(A) - sum(B) = a - b = s. Then, multiply the elements of both arrays and divide one by the other. Again, things cancel, so mult(A) / mult(B) = a / b = r. Now, from these, we get a = rb, so rb - b = s or b = s / (r - 1) and then a = rs / (r - 1).

I'm calling this mathematical because multiplying things out might not be a reasonable thing to do in a real program. The key is to have two different operations that both individually allow the canceling behavior and so that one distributes over the other. This latter property is used when going from rb - b = s to b = s / (r - 1), and that won't work, say, with addition and XOR, which was my first attempt.

回答3:

This can be solved quickly from just the sum and the sum of the squares of the two sequences. And calculating these sums will certainly be faster than the hashes that are suggested, and doesn't involve any comparisons between the sequence items.

Here's how to do it: If the two sets are {a_i} and {b_i}, then call A and B their sums, and A2 and B2 are the sum of the squares, i.e. A2 = Sum({a_i²}), and for convenience, D=A-B, and D2=A2-B2. Therefore, D=a-b and D2=a²-b², where a and b are the two elements that are different, and from this we see

a = (D²+D2)/(2*D)
b = a - D

This works out because, from algebra, a²-b²=(a+b)(a-b) or D2=(a+b)D, so a+b=D2/D, and since we also know a-b, we can find a and b.

An example in Python may be more convincing

a, b = 5, 22   # the initial unmatched terms
x, y = range(15), range(15)
y[a] = b
print "x =", x  # x = [0, 1, 2, 3, 4,  5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
print "y =", y  # y = [0, 1, 2, 3, 4, 22, 6, 7, 8, 9, 10, 11, 12, 13, 14]

D = sum(x) - sum(y)
D2 = sum([i**2 for i in x]) - sum([i**2 for i in y])  #element-wise squaring
a = (D2+D*D)/(2*D)
b = a - D

print "a=%i, b=%i" % (a, b)
#prints a=5, b=22  which is correct

(Of course, this is somewhat similar to jk's answer, except it doesn't require the multiplication of all the terms and the huge numbers that would result, but thanks to jk for the idea of a mathematical approach.)

回答4:

Technically, you can do it in constant time, since the arrays (and values within them) are limited. For the generalized problem, we have to make out something trickier.

Here's a linear-time solution.

First, we should build a hash based on one array. Value lookup in a hash table takes O(1 + k/n) comparisons [1], where k is the length of hash table. Thus iteration over the first array (that contains n elements) and lookups for each value takes O(n+k).

Then we iterate the other one, looking up for each element in the hash. When an element is not found -- that's unique one from the other array. (O(n+k) again). Then we iterate hash to look for the second unique element (O(k)).

Total time is O(n+k). Since it doesn't make sense to let k be more than n, it's the linear solution.

Perl code for that:

sub unique
{
  my ($arr, $brr) = @_;
  my %hash = map{$_ => 1} @$arr;
  %hash{$_}-- for @$brr;
  return grep {$_} keys %hash;
}

回答5:

In LINQ:

var unique1 = (from a in arrayA where !arrayB.Contains(a) select a).First();
var unique2 = (from b in arrayB where !arrayA.Contains(b) select b).First();
return new Pair(unique1, unique2);

...

public sealed class Pair<T0, T1>
{
    public T0 Item1 {get;set;}
    public T1 Item2 {get;set;}
    public Pair(T0 item1, T1 item2)
    {
        Item1 = item1;
        Item2 = item2;
    }
    //plus GetHashCode, equality etc.
}

回答6:

Here's another possibility. Unlike my previous answer it doesn't modify the arrays passed in, and should have a lower big-O bound (O(n) instead of O(n^2) - assuming constant time hashtable lookups), but will take up significantly more memory.

function findUnique(a:Array, b:Array):Array {
  var aHash:Hashtable = buildHash(a);
  var bHash:Hashtable = buildHash(b);

  var uniqueFromA:int;
  var uniqueFromB:int;

  for each(value:int in a) {
    if(!bHash.contains(value)) {
      uniqueFromA = value;
      break;
    } else {
      /* Not necessary, but will speed up the 2nd for-loop by removing
       * values we know are duplicates. */
      bHash.remove(value);
    }
  }

  for each(value:int in b) {
    if(!aHash.contains(value)) {
      uniqueFromB = value;
      break;
    }
  }

  return [uniqueFromA, uniqueFromB];
}

function buildHash(a:Array):Hashtable {
  var h:Hashtable = new Hashtable();
  for each(value:int in a) {
    h[value] = true;
  }

  return h;
}

回答7:

You have given two integer arrays each of size 10.
Both contains 9 equal elements (say 1 to 9)
Only one element is different.

Depending on the constraints, you can solve this very quickly in linear time. If you hold an int[10], then you can assume that an element at index 1 corresponds to the number 1; the element itself holds a count of both arrays. The following pseudocode will solve the problem quickly:

let buckets = new int[10] // init all buckets to zero
for num in arr1 do buckets[num]++ // add numbers from the first array
for num in arr2 do buckets[num]++ // add from the second array
for i in 1 to 9 do                // find odd man out
    if buckets[i] <= 1 then return i

This is essentially a bounded hashtable. This only works if our given list of elements is constrained between 1 and 9.

Technically, you don't even need to keep a running count of each element. You could, in principle, simply loop through arr1, then iterate through arr2 until you run across an element which was not hashed from the first array.

回答8:

The logic behind almost all of the previous answers is always the same: use set operations from mathematics to solve the problem.

A set in mathematics can contain each element only once. So the following list can’t be a set in the mathematical sense, since it contains one number (3) twice:

{ 1, 2, 3, 4, 3, 5 }

Since set operations, in particular checking whether an element already exists in a set, are common operations, most languages have data structures that implement these set operations efficiently. So we can simply fall back to this in our solution:

// Construct set from first list:
Set uniques = Set.from(list1);

// Iterate over second list, check each item’s existence in set.
for each (item in list2)
    if (not uniques.Contains(item))
        return item;

Different implementations of sets yield different performance, but this performance will always be superior to the naive solution (for large lists). In particular, two implementations exist:

As a (self-balanced) search tree. The tree has its elements sorted, thus looking up a particular element is efficient by using a binary search (or variant thereof). Lookup thus has a performance of O(log n). Creating the tree set from the input has performance O(n · log n). This is also the overall performance.
Hash tables can be implemented to have average-case look-up performance of O(1) (and with a few tricks, this can also be made the worst-case performance). Creating the hash table can be done in O(n). Therefore, hash tables can achieve an overall runtime of O(n).
Bloom filters offer a nice probabilistic solution – i.e. the solution may actually be wrong, but we can control how (un)likely this will be. It is particularly interesting because it’s very space-efficient.
… many other implementations exist.

In every case, the usage remains the same and the above pseudo-code gives a textbook solution to your problem. A Java implementation might look as follows:

// Construct set from first list:
Set<Integer> uniques = new HashSet<Integer>(list1);

// Iterate over second list, check each item’s existence in set.
for (int item : list2)
    if (! uniques.Contains(item))
        return item;

Notice how this looks almost exactly like the pseudo-code. Solutions in C#, C++ or other languages wouldn’t be much different.

EDIT Oops, I’ve just noticed that the requested return value is the pair of mismatching elements. However, this requirement doesn’t change the reasoning and almost doesn’t change the pseudo-code (do the same thing, with interchanged lists).

回答9:

Depending on the language you are using, it might have a built in way to do the array differences. PHP does http://ca3.php.net/array_diff

回答10:

Here's some simple pseudocode for a solution. I'm assuming it's OK to modify the arrays and that arrays have a remove(value) method (or that you could write one trivially). It takes the 2 arrays and returns an array containg 2 values, the first is the unique value from the first array and the second is the unique value from the 2nd array.

function findUnique(a:Array, b:Array):Array {
  var uniqueFromA:int;
  var uniqueFromB:int;

  for each(value:int in a) {
    var len:int = b.length;
    b.remove(value);
    /* b's length didn't change, so nothing was removed, so the value doesn't
     * exist in it. */
    if(b.length == len) {
      uniqueFromA = value;
    }
  }

  /* Only the unique value in b still exists in b */
  uniqueFromB = b[0];

  return [uniqueFromA, uniqueFromB];
}

回答11:

It is impossible to solve this problem without making comparisons. Some modern languages do have set difference and other aggregate operators, but they work by internally making comparisons. If the total array size is odd (not the case here) then it does work to xor the elements together. I suppose the question of whether the ALU's xor op should be considered a comparison is debatable.

And without specifying a language, you can't refer to a library, so the only possible solution is a comparison-based pseudo-code exposition.

So here you go:

a <- first element
a_count = 1
b_count = 0
loop over remaining elements
   if element != a
     b <- element
     ++b_count
   else
     ++a_count
   if found_result
      break loop
end loop

found_result
   if a_count > 1 and b_count > 0
     the unique one is b
     return true
   if b_count > 1 and a_count > 0 # a_acount actually always > 0
     the unique one is a
     return true
   return false

回答12:

Put each element of the first array into a hash. For each item in the second array, if it's already in the hash, remove it, else add it. At the end you have a hash with two keys, which are your unique elements.

Simple verison in Ruby

def unique(a,b)
  h={}
  a.each do |ax|
    h[ax]=true
  end
  b.each do |bx|
    if h[bx]
      h.delete(bx)
    else
      h[bx]=true
    end
  end
  h.keys
end

With a little more Ruby personality, but still in the universe where we can't simply do (a | b) - (a & b) or a.to_set ^ b.to_set:

def unique(a,b)
  h = {}
  a.each do |ax|
    h[ax] = true
  end
  b.each do |bx|
    h.delete(bx) or h[bx] = true
  end
  h.keys
end

回答13:

Make sets A, B of the arrays a,b respectively. A \ B gives you the additional integer in A. B \ A gives you the additional integer in B.

If either one of these operations returns an empty set, then the additional integer is in the array twice. You find this out when constructing the set: adding a duplicate integer to the set does not increase the set's size.

回答14:

How about getting the total sum for each of the arrays and subtracting one from the other? The difference is the extra element.

eg: A = 1,2,3,4,5 B = 1,2,3,4

sum(A) = 15, sum(B) = 10; 15 - 10 = 5 which is the extra element.

回答15:

This can be done using Xor.

First Xor all elements from both arrays. Let x and y be the extra element each array. What remains is x^y.

Now in xor, if a bit is set, it means that it is set in one of the the two numbers and not in the other one.

We can use this to find the missing indivdual numbers. So find a bit which is set in a^b. Getting Righmost bit is easy. It can be done by

n& ~(n-1)

(1110) & ~(1101) = 0010

To get each of the individual numbers, we split the numbers of both the arrays into 2 parts, numbers have the checked bit set, and which don't.We do XOR on each of the set, so that we get the values a and b. This cancels out all the repeating elements, and separates out x and y.

This can be quite confusing. Now take x=3, y = 2

x=110

y=010

x^y=100

So when we get the bit set, the number we get is bitset = 100. Third bit is set. Suppose array elements be 5,1 ( Both repeated twice)

5=101

6=001

What now, 5 has 3rd bit, so we xor it with x

we get x^5^5 = x

Similarly, 6 doesn't have 3rd bit set, so xor with y.

We get y^1^1 = y

Code

for(i=0;i<n;i++)
xor = xor^a[i]^b[i];

set_bit = xor & ~(xor-1) ;
for(i = 0; i < n; i++)
{
if(a[i] & set_bit_no)
 x = x ^ a[i]; /*XOR of first set */
else
 y = y ^ a[i]; /*XOR of second set*/

if(b[i] & set_bit_no)
 x = x ^ b[i]; /*XOR of first set */
else
 y = y ^ b[i]; /*XOR of second set*/
 }

This is similar to the method posted here http://www.geeksforgeeks.org/find-two-non-repeating-elements-in-an-array-of-repeating-elements/

回答16:

Heres a solution that doesn't require hash or set, O(1) space and O(n) time. Its not the most optimal solution posted but its a good one. All you do is sum the values in list1 and sum the values list2 and find the difference.

Javascript

var findExtra= function (l1, l2) {
    var l1_sum = l1.reduce(function(prev, curr) {
        return prev + curr;
    });
    var l2_sum = l2.reduce(function(prev, curr) {
        return prev + curr;
    });

    console.log(l1_sum, l2_sum);

    if (l1.length < l2.length) return l2_sum - l1_sum;
    return l1_sum - l2_sum;
}

console.log(findExtra([1,2,3,4,-1], [1,2,3,4]));

回答17:

Two passes through the array will suffice.

1st pass: add every item in the shorter list to a hashmap (dict in python). 2nd pass: for each item in the longer list, check if key exist in hashmap (O(1) search time). If not, that key is the unique entry.

Total time complexity: O(2n) = O(n)

回答18:

Given two arrays say A1 of size 'n' and A2 of size 'n-1', both the arrays have same element except one which we have to find.

Note: elements in A1 can be repeated.

Example:

A1:{2,5,5,3}

A2:{2,5,3}

Output: 5

A1:{1,2,3,3,3}

A2:{2,3,1,3}

Output: 3

public static void main(String args[])
{
    int[] a ={1,2,3,3,3};
    int[] b ={2,3,1,3};
    int flag=1;
    int num=0;
    List<Integer> lst = new ArrayList<>(b.length);
    for(int i : b)
        lst.add(Integer.valueOf(i));


    for(int i=0;i<a.length;i++)
    {
        flag=1;
        for(int j=0;j<lst.size();j++)
        {
            if(a[i] == lst.get(j)){
                lst.remove(j);
                flag=0;
                break;
            }

        }
        if(flag == 1)
            num=a[i];
    }

    System.out.println(num);

}

回答19:

 int[] a = {1, 2, 3, 5};
 int[] b = {1, 2, 3, 5, 6};

      HashSet set = new HashSet();
        for (int o : b) {
            set.add(o);
        }

        for (int p : a) {
            if (set.contains(p)) {
                set.remove(p);
            }
        }

        Iterator iterator = set.iterator();
        while (iterator.hasNext()) {
            Log.d("TAG", " " + iterator.next());
        }

来源：https://stackoverflow.com/questions/1590405/distinguishing-extra-element-from-two-arrays

标签

algorithm

arrays

data-structures

logic