Faster algorithm to find unique element between two arrays?

后端 未结 9 1839
野性不改
野性不改 2020-12-22 20:20

EDIT: For anyone new to this question, I have posted an answer clarifying what was going on. The accepted answer is the one I feel best answers my question

相关标签:
9条回答
  • 2020-12-22 20:41

    Assuming only one element was added, and the arrays were identical to start with, you could hit O(log(base 2) n).

    The rationale is that any array is subject to searching binary-ly O(log n). Except that in this case you are not searching for a value in an ordered array, you are searching for the first non-matching element. In such a circumstance a[n] == b[n] means that you are too low, and a[n] != b[n] means that you might be too high, unless a[n-1] == b[n-1].

    The rest is basic binary search. Check the middle element, decide which division must have the answer, and do a sub-search on that division.

    0 讨论(0)
  • 2020-12-22 20:42

    This is probably the fastest you can do it in Java using HotLick's suggestion in the comments. It makes the assumption that b.length == a.length + 1 so b is the larger array with the extra "unique" element.

    public static int getUniqueElement(int[] a, int[] b) {
        int ret = 0;
        int i;
        for (i = 0; i < a.length; i++) {
            ret = ret ^ a[i] ^ b[i];
        }
        return ret ^ b[i];
    }
    

    Even if the assumption cannot be made, you can easily expand it to include the case where either a or b can be the larger array with the unique element. It's still O(m+n) though and only loop/assignment overhead is reduced.

    Edit:

    Due to details of language implementation, this is still (surprisingly) the fastest way to do it in CPython.

    def getUniqueElement1(A, B):
        ret = 0
        for a in A: ret = ret ^ a
        for b in B: ret = ret ^ b
        return ret
    

    I have tested this with the timeit module and found some interesting results. It turns out that the longhand ret = ret ^ a is indeed faster in Python than the shorthand ret ^= a. Also iterating over the elements of a loop is much much faster than iterating over the indexes and then making subscript operations in Python. That is why this code is much faster than my previous method where I tried to copy Java.

    I guess the moral of the story is that there is no correct answer because the question is bogus anyways. As the OP noted in another answer below, it turns out you can't really go any faster than O(m+n) on this and his teacher was just pulling his leg. Thus the problem reduces to finding the fastest way to iterate over all elements in the two arrays and accumulating the XOR of all of them. And this means it's entirely dependent on language implementation, and you have to do some testing and playing around to get the true "fastest" solution in whatever implementation you are using, because the overall algorithm will not change.

    0 讨论(0)
  • 2020-12-22 20:45

    You can store the count of each value in a collection such as an array or hash map. O(n) then you can check the values of the other collection and stop as soon as you know you have a miss match. This could mean you only search half the second array on average.

    0 讨论(0)
提交回复
热议问题