What is the most efficient way of copying elements that occur only once in a std vector?

一曲冷凌霜 提交于 2019-12-05 11:45:19

问题


I have a std vector with elements like this:

[0 , 1 , 2 , 0 , 2 , 1 , 0 , 0 , 188 , 220 , 0 , 1 , 2 ]

What is the most efficient way to find and copy the elements that occur only once in this vector, excluding the brute force O(n^2) algorithm? In this case the new list should contain [188, 220]


回答1:


  1. Make an unordered_map<DataType, Count> count;
  2. Iterate over the input vector increasing count of each value. Sort of count[value]++;
  3. Iterate over the count map copying keys for which value is 1.

It's O(n). You have hashes so for small data sets normal map might be more efficient, but technically it would be O(n log n).

It's a good method for discrete data sets.

Code example:

#include <iostream>
#include <unordered_map>
#include <vector>
#include <algorithm>
using namespace std;

int main() {
    vector<int> v{1,1,2,3,3,4};
    unordered_map<int,int> count;
    for (const auto& e : v) count[e]++;
    vector<int> once;
    for (const auto& e : count) if(e.second == 1) once.push_back(e.first);
    for (const auto& e : once) cout << e << '\n';
    return 0;
}

I have tried few ideas. But I don't see a way around map. unordered_multiset is almost a great way... except it does not allow you to iterate over keys. It has a method to check for count of key, but you would need another set just for keys to probe. I don't see it as a simpler way. In modern c++ with autos counting is easy. I've also looked through algorithm library, but I haven't found any transfrom, copy_if, generate, etc. which could conditionally transform an element (map entry -> value if count is 1).




回答2:


There are very few universally-optimal algorithms. Which algorithm works best usually depends upon the properties of the data that's being processed. Removing duplicates is one such example.

Is v small and filled mostly with unique values?

auto lo = v.begin(), hi = v.end();
std::sort(lo, hi);
while (lo != v.end()) {
    hi = std::mismatch(lo + 1, v.end(), lo).first;
    lo = (std::distance(lo, hi) == 1) ? hi : v.erase(lo, hi);
}

Is v small and filled mostly with duplicates?

auto lo = v.begin(), hi = v.end();
std::sort(lo, hi);
while (lo != v.end()) {
    hi = std::upper_bound(lo + 1, v.end(), *lo);
    lo = (std::distance(lo, hi) == 1) ? hi : v.erase(lo, hi);
}

Is v gigantic?

std::unordered_map<int, bool> keyUniqueness{};
keyUniqueness.reserve(v.size());
for (int key : v) {
    bool wasMissing = keyUniqueness.find(key) == keyUniqueness.end();
    keyUniqueness[key] = wasMissing;
}
v.clear();
for (const auto& element : keyUniqueness) {
    if (element.second) { v.push_back(element.first); }
}

And so on.




回答3:


@luk32's answer is definitely the most time efficient way of solving this question. However, if you are short on memory and can't afford an unordered_map, there are other ways of doing it.

You can use std::sort() to sort the vector first. Then the non-duplicates can be found in one iteration. Overall complexity being O(nlogn).

If the question is slightly different, and you know there is only one non-duplicate element, you can use this code (code in Java). The conplexity here is O(n).




回答4:


Since you use a std::vector, I presume you want to maximize all its benefits including reference locality. In order to do that, we need a bit of typing here. And I benchmarked the code below...

I have a linear O(n) algorithm here (effectively O(nlog(n))), its a bit like brian's answer, but I use OutputIterators instead of doing it in-place. The pre-condition is that it's sorted.


template<typename InputIterator, typename OutputIterator>
OutputIterator single_unique_copy(InputIterator first, InputIterator last, OutputIterator result){
    auto previous = first;
    if(previous == last || ++first == last) return result;
    while(true){
        if(*first == *previous)
            while((++first != last) && (*first == *previous));
        else
            *(result++) = *previous;
        if(first == last) break;
        previous = first;
        ++first;
    }
    return ++result;
}

And here is a sample usage:

int main(){
    std::vector<int> vm = {0, 1, 2, 0, 2, 1, 0, 0, 1, 88, 220, 0, 1, 2, 227, -8};
    std::vector<int> kk;
    std::sort(vm.begin(), vm.end());
    single_unique_copy(vm.begin(), vm.end(), std::back_inserter(kk));
    for(auto x : kk) std::cout << x << ' ';
    return 0;
}

As expected, the output is:

-8, 88, 220, 227

Your use case may be different from mine, so, profile first... :-)

EDIT:

  • Using luk32's algorithm and mine... Using 13 million elements... created in descending order, repeated at every i % 5.
  • Under debug build, luk32: 9.34seconds and mine: 7.80seconds
  • Under -O3, luk32: 2.71seconds and mine 0.52seconds
  • Mingw5.1 64bit, Windows10, 1.73Ghz Core i5 4210U, 6GB DDR3 1600Mhz RAM
  • Benchmark here, http://coliru.stacked-crooked.com/a/187e5e3841439742

For smaller numbers, the difference still holds, until it becomes a non-critical code



来源:https://stackoverflow.com/questions/35215773/what-is-the-most-efficient-way-of-copying-elements-that-occur-only-once-in-a-std

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!