More elegant way to check for duplicates in C++ array?

旧时模样 提交于 2019-11-27 13:15:43
Puppy

You could sort the array in O(nlog(n)), then simply look until the next number. That is substantially faster than your O(n^2) existing algorithm. The code is also a lot cleaner. Your code also doesn't ensure no duplicates were inserted when they were re-entered. You need to prevent duplicates from existing in the first place.

std::sort(userNumbers.begin(), userNumbers.end());
for(int i = 0; i < userNumbers.size() - 1; i++) {
    if (userNumbers[i] == userNumbers[i + 1]) {
        userNumbers.erase(userNumbers.begin() + i);
        i--;
    }
}

I also second the reccomendation to use a std::set - no duplicates there.

The following solution is based on sorting the numbers and then removing the duplicates:

#include <algorithm>

int main()
{
    int userNumbers[6];

    // ...

    int* end = userNumbers + 6;
    std::sort(userNumbers, end);
    bool containsDuplicates = (std::unique(userNumbers, end) != end);
}

Indeed, the fastest and as far I can see most elegant method is as advised above:

std::vector<int> tUserNumbers;
// ...
std::set<int> tSet(tUserNumbers.begin(), tUserNumbers.end());
std::vector<int>(tSet.begin(), tSet.end()).swap(tUserNumbers);

It is O(n log n). This however does not make it, if the ordering of the numbers in the input array needs to be kept... In this case I did:

    std::set<int> tTmp;
    std::vector<int>::iterator tNewEnd = 
        std::remove_if(tUserNumbers.begin(), tUserNumbers.end(), 
        [&tTmp] (int pNumber) -> bool {
            return (!tTmp.insert(pNumber).second);
    });
    tUserNumbers.erase(tNewEnd, tUserNumbers.end());

which is still O(n log n) and keeps the original ordering of elements in tUserNumbers.

Cheers,

Paul

You can add all elements in a set and check when adding if it is already present or not. That would be more elegant and efficient.

It is in extension to the answer by @Puppy, which is the current best answer.

PS : I tried to insert this post as comment in the current best answer by @Puppy but couldn't so as I don't have 50 points yet. Also a bit of experimental data is shared here for further help.

Both std::set and std::map are implemented in STL using Balanced Binary Search tree only. So both will lead to a complexity of O(nlogn) only in this case. While the better performance can be achieved if a hash table is used. std::unordered_map offers hash table based implementation for faster search. I experimented with all three implementations and found the results using std::unordered_map to be better than std::set and std::map. Results and code are shared below. Images are the snapshot of performance measured by LeetCode on the solutions.

bool hasDuplicate(vector<int>& nums) {
    size_t count = nums.size();
    if (!count)
        return false;
    std::unordered_map<int, int> tbl;
    //std::set<int> tbl;
    for (size_t i = 0; i < count; i++) {
        if (tbl.find(nums[i]) != tbl.end())
            return true;
        tbl[nums[i]] = 1;
        //tbl.insert(nums[i]);
    }
    return false;
}

unordered_map Performance (Run time was 52 ms here)

Set/Map Performance

I'm not sure why this hasn't been suggested but here is a way in base 10 to find duplicates in O(n).. The problem I see with the already suggested O(n) solution is that it requires that the digits be sorted first.. This method is O(n) and does not require the set to be sorted. The cool thing is that checking if a specific digit has duplicates is O(1). I know this thread is probably dead but maybe it will help somebody! :)

/*
============================
Foo
============================
* 
   Takes in a read only unsigned int. A table is created to store counters 
   for each digit. If any digit's counter is flipped higher than 1, function
   returns. For example, with 48778584:
    0   1   2   3   4   5   6   7   8   9
   [0] [0] [0] [0] [2] [1] [0] [2] [2] [0]

   When we iterate over this array, we find that 4 is duplicated and immediately
   return false.

*/
bool Foo( unsigned const int &number)
{
    int temp = number;
    int digitTable[10]={0};

    while(temp > 0)
    {
        digitTable[temp % 10]++; // Last digit's respective index.
        temp /= 10; // Move to next digit
    }

    for (int i=0; i < 10; i++)
    {
        if (digitTable [i] > 1)
        {
            return false;
        }
    }
    return true;
}

It's ok, specially for small array lengths. I'd use more efficient aproaches (less than n^2/2 comparisons) if the array is mugh bigger - see DeadMG's answer.

Some small corrections for your code:

  • Instead of int j = i writeint j = i +1 and you can omit your if(j != i) test
  • You should't need to declare i variable outside the for statement.
//std::unique(_copy) requires a sorted container.
std::sort(cont.begin(), cont.end());

//testing if cont has duplicates
std::unique(cont.begin(), cont.end()) != cont.end();

//getting a new container with no duplicates
std::unique_copy(cont.begin(), cont.end(), std::back_inserter(cont2));
#include<iostream>
#include<algorithm>

int main(){

    int arr[] = {3, 2, 3, 4, 1, 5, 5, 5};
    int len = sizeof(arr) / sizeof(*arr); // Finding length of array

    std::sort(arr, arr+len);

    int unique_elements = std::unique(arr, arr+len) - arr;

    if(unique_elements == len) std::cout << "Duplicate number is not present here\n";
    else std::cout << "Duplicate number present in this array\n";

    return 0;
}

As mentioned by @underscore_d, an elegant and efficient solution would be,

#include <algorithm>
#include <vector>

template <class Iterator>
bool has_duplicates(Iterator begin, Iterator end) {
    using T = typename std::iterator_traits<Iterator>::value_type;
    std::vector<T> values(begin, end);

    std::sort(values.begin(), values.end());
    return (std::adjacent_find(values.begin(), values.end()) != values.end());
}

int main() {
    int user_ids[6];
    // ...
    std::cout << has_duplicates(user_ids, user_ids + 6) << std::endl;
}

I think @Michael Jaison G's solution is really brilliant, I modify his code a little to avoid sorting. (By using unordered_set, the algorithm may faster a little.)

template <class Iterator>
bool isDuplicated(Iterator begin, Iterator end) {
    using T = typename std::iterator_traits<Iterator>::value_type;
    std::unordered_set<T> values(begin, end);
    std::size_t size = std::distance(begin,end);
    return size != values.size();
}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!