问题
Given a std::vector<std::vector<int>>
:
- I want to output a sorted
std::vector<std::vector<int>>
- that contains only unique
std::vector<int>
- as well as the frequency (i.e., count) of these unique
std::vector<int>
My question is twofold:
- how can I efficiently achieve this relying only on the standard library?
- what is causing the code below to perform poorly?
I tried the following:
std::vector<std::vector<int>> responseVectors
{
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 },
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 }
};
std::vector<std::vector<int>> uniqueResponseVectors;
uniqueResponseVectors.reserve(responseVectors.size());
std::sort(responseVectors.begin(), responseVectors.end());
std::unique_copy(responseVectors.begin(), responseVectors.end(), std::back_inserter(uniqueResponseVectors));
std::vector<int> freqTotal;
freqTotal.reserve(uniqueResponseVectors.size());
for(auto&& vector : uniqueResponseVectors)
{
int count = std::count(responseVectors.begin(), responseVectors.end(), vector);
freqTotal.push_back(count);
}
And the output should be:
for(auto&& vector : uniqueResponseVectors)
{
for(auto&& value : vector)
{
std::cout << "\t" << value << "\t";
}
std::cout << "\n";
}
// Printed result for the `uniqueResponseVectors`:
//
// 0 1 2
// 1 2 3
// 2 1 2
// 2 3 2
// 3 2 3
// 3 3 3
// Similarly for the `freqTotal`:
//
// 2
// 6
// 2
// 2
// 4
// 2
Additional context:
when I tried the code above against a larger dataset (i.e.,
responseVectors
with a size of 100000 and 12 as the size of eachstd::vector<int>
), it was slow.I also tried to manipulate
responseVectors
directly by callingstd::unique
anderase()
on it, in order avoid creating a copy. Then I used an iterator loop tostd::count
how many times a uniquestd::vector<int>
occurs. However, this proved even slower.
回答1:
user463035818 pointed me to an excellent answer by YSC
and PaulMcKenzie further iterated on the importance of picking the proper data structure(s)
Based on this input, I tried to particularize the solution provided by YSC to my case, in order to better understand what is going on. It boils down to using std::map
which is a sorted associative container:
std::map<std::vector<int>, int> SortAndCountOccurences(std::vector<std::vector<int>>& vectors)
{
std::map<std::vector<int>, int> result;
std::for_each(vectors.begin(), vectors.end(), [&result](auto const& vector) {
++result[vector];
});
return result;
}
With the following usage:
std::vector<std::vector<int>> responseVectors
{
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 },
{ 2, 3, 2 },
{ 3, 2, 3 },
{ 3, 2, 3 },
{ 3, 3, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 1, 2, 3 },
{ 2, 1, 2 },
{ 0, 1, 2 }
};
std::map<std::vector<int>, int> sortedVectors = SortAndCountOccurences(responseVectors);
Which will output:
for(auto&& vector : sortedVectors)
{
for(auto&& value : vector.first)
{
std::cout << "\t" << value << "\t";
}
std::cout << "-> x" << vector.second << " times" << "\n";
}
// 0 1 2 -> x2 times
// 1 2 3 -> x6 times
// 2 1 2 -> x2 times
// 2 3 2 -> x2 times
// 3 2 3 -> x4 times
// 3 3 3 -> x2 times
Note: The solution by YSC can be generalized to anything iterable.
来源:https://stackoverflow.com/questions/54479308/sort-a-vector-of-vectors-and-count-frequencies-of-unique-occurences-using-the-st