I was asked to write my own implementation to remove duplicated values in an array. Here is what I have created. But after tests with 1,000,000 elements it took very long ti
Okay, so you cannot use Set
or other collections. One solution I don't see here so far is one based on the use of a Bloom filter, which essentially is an array of bits, so perhaps that passes your requirements.
The Bloom filter is a lovely and very handy technique, fast and space-efficient, that can be used to do a quick check of the existence of an element in a set without storing the set itself or the elements. It has a (typically small) false positive rate, but no false negative rate. In other words, for your question, if a Bloom filter tells you that an element hasn't been seen so far, you can be sure it hasn't. But if it says that an element has been seen, you actually need to check. This still saves a lot of time if there aren't too many duplicates in your list (for those, there is no looping to do, except in the small probability case of a false positive --you typically chose this rate based on how much space you are willing to give to the Bloom filter (rule of thumb: less than 10 bits per unique element for a false positive rate of 1%).
There are many implementations of Bloom filters, see e.g. here or here, so I won't repeat that in this answer. Let us just assume the api described in that last reference, in particular, the description of put(E e)
:
true
if the Bloom filter's bits changed as a result of this operation. If the bits changed, this is definitely the first time object has been added to the filter. If the bits haven't changed, this might be the first time object has been added to the filter. (...)
An implementation using such a Bloom filter would then be:
public static int[] removeDuplicates(int[] arr) {
ArrayList out = new ArrayList<>();
int n = arr.length;
BloomFilter bf = new BloomFilter<>(...); // decide how many bits and how many hash functions to use (compromise between space and false positive rate)
for (int e : arr) {
boolean might_contain = !bf.put(e);
boolean found = false;
if (might_contain) {
// check if false positive
for (int u : out) {
if (u == e) {
found = true;
break;
}
}
}
if (!found) {
out.add(e);
}
}
return out.stream().mapToInt(i -> i).toArray();
}
Obviously, if you can alter the incoming array in place, then there is no need for an ArrayList
: at the end, when you know the actual number of unique elements, just arraycopy()
those.