Sort a vector of vectors and count frequencies of unique occurences using the standard library

问题

Given a std::vector<std::vector<int>>:

I want to output a sorted std::vector<std::vector<int>>
that contains only unique std::vector<int>
as well as the frequency (i.e., count) of these unique std::vector<int>

My question is twofold:

how can I efficiently achieve this relying only on the standard library?
what is causing the code below to perform poorly?

I tried the following:

std::vector<std::vector<int>> responseVectors 
{
    { 2,  3,  2 },
    { 3,  2,  3 }, 
    { 3,  2,  3 }, 
    { 3,  3,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 2,  1,  2 }, 
    { 0,  1,  2 }, 
    { 2,  3,  2 },
    { 3,  2,  3 }, 
    { 3,  2,  3 }, 
    { 3,  3,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 2,  1,  2 }, 
    { 0,  1,  2 }
};

std::vector<std::vector<int>> uniqueResponseVectors;
uniqueResponseVectors.reserve(responseVectors.size());

std::sort(responseVectors.begin(), responseVectors.end());
std::unique_copy(responseVectors.begin(), responseVectors.end(), std::back_inserter(uniqueResponseVectors));

std::vector<int> freqTotal;
freqTotal.reserve(uniqueResponseVectors.size());

for(auto&& vector : uniqueResponseVectors)
{
    int count = std::count(responseVectors.begin(), responseVectors.end(), vector);
    freqTotal.push_back(count);
}

And the output should be:

for(auto&& vector : uniqueResponseVectors)
{
    for(auto&& value : vector)
    {
        std::cout << "\t" << value << "\t";
    }

    std::cout << "\n";
}

// Printed result for the `uniqueResponseVectors`:
//
//    0               1               2
//    1               2               3
//    2               1               2
//    2               3               2
//    3               2               3
//    3               3               3

// Similarly for the `freqTotal`:
//
//    2
//    6
//    2
//    2
//    4
//    2

Additional context:

when I tried the code above against a larger dataset (i.e., responseVectors with a size of 100000 and 12 as the size of each std::vector<int>), it was slow.
I also tried to manipulate responseVectors directly by calling std::unique and erase() on it, in order avoid creating a copy. Then I used an iterator loop to std::count how many times a unique std::vector<int> occurs. However, this proved even slower.

回答1:

user463035818 pointed me to an excellent answer by YSC
and PaulMcKenzie further iterated on the importance of picking the proper data structure(s)

Based on this input, I tried to particularize the solution provided by YSC to my case, in order to better understand what is going on. It boils down to using std::map which is a sorted associative container:

std::map<std::vector<int>, int> SortAndCountOccurences(std::vector<std::vector<int>>& vectors)
{
    std::map<std::vector<int>, int> result;

    std::for_each(vectors.begin(), vectors.end(), [&result](auto const& vector) {
            ++result[vector]; 
    });

    return result;
}

With the following usage:

std::vector<std::vector<int>> responseVectors 
{
    { 2,  3,  2 },
    { 3,  2,  3 }, 
    { 3,  2,  3 }, 
    { 3,  3,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 2,  1,  2 }, 
    { 0,  1,  2 }, 
    { 2,  3,  2 },
    { 3,  2,  3 }, 
    { 3,  2,  3 }, 
    { 3,  3,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 1,  2,  3 }, 
    { 2,  1,  2 }, 
    { 0,  1,  2 }
};

std::map<std::vector<int>, int> sortedVectors = SortAndCountOccurences(responseVectors);

Which will output:

for(auto&& vector : sortedVectors)
{
    for(auto&& value : vector.first)
    {
        std::cout << "\t" << value << "\t";
    }

    std::cout << "-> x" << vector.second << " times" << "\n";
}

//    0               1               2       -> x2 times
//    1               2               3       -> x6 times
//    2               1               2       -> x2 times
//    2               3               2       -> x2 times
//    3               2               3       -> x4 times
//    3               3               3       -> x2 times

Note: The solution by YSC can be generalized to anything iterable.

来源：https://stackoverflow.com/questions/54479308/sort-a-vector-of-vectors-and-count-frequencies-of-unique-occurences-using-the-st

标签

c++

sorting

count

unique

std