Have many vectors sorted together [duplicate]

ε祈祈猫儿з 提交于 2019-12-12 09:26:43

问题


I have three vectors of the same size (~ 1 million items):

std::vector<wstring> name;
std::vector<int> x;
std::vector<int> y;

which can be seen as three "columns".

How to sort A->Z the vector name:

std::sort(name.begin(), name.end())

but having the vectors x and y sorted accordingly?


Example:

name  x  y                 name  x  y
BCD   7  9                 ABC   4  3
ZYX   1  4        =>       BCD   7  9
ABC   4  3                 ZYX   1  4

The good thing about using a std::vector, is that I can easily select/filter a few items in the big vector by taking just a list of index to keep (example: let's keep items 12, 1872, 2834, 1831). I thought about using a std::map but I fear it won't be as efficient for this: how to keep a list of elements to keep in a map?


回答1:


There are a couple possible ways to do this. The easiest way is to wrap name, x, and y in a struct:

struct Person {
    std::wstring name;
    int x;
    int y;
};

Then you can have a std::vector<Person> people and sorting it would be (assuming C++14)

std::sort(people.begin(), people.end(),
    [](auto const& lhs, auto const& rhs) { return lhs.name < rhs.name; });

However, if you know that this will cause performance problems due to fewer elements fitting in the cache (that is, you'd frequently iterate over only x or only y and you are in a very constrained environment such as high performance gaming), I'd suggest only sorting one vector. Unless you know what you're doing, you'd need to benchmark both options.

Basically, have a vector that keeps track of the ordering:

std::vector<std::wstring> name;
std::vector<int> x;
std::vector<int> y

std::vector<std::size_t> ordering(name.size());
std::iota(ordering.begin(), ordering.end(), 0);

std::sort(ordering.begin(), ordering.end(),
    [&](auto const& lhs, auto const& rhs) {
        return name[lhs] < name[rhs];
    });

Then you can simply iterate over ordering to go through each parallel vector in the new order.

It's possible that the extra level of indirection will make it less efficient. For example, the CPU might think that there's a data dependency where there is none. Furthermore, the extra data we are keeping track of in ordering could easily take enough room in the cache to counteract the benefit of separating name, x, and y; you'd need to know the specifications of your target architecture and profile to be sure.

If you would want to keep iterating over them in this new order, you would want to use this ordering vector to sort the other vectors, because the access to the elements would become random. That would counteract the benefit of keeping the vectors separate (unless the vectors are small enough to fit in the cache anyway).

The easiest way to do that would be to create a new vector:

std::vector<std::wstring> newNames;
newNames.reserve(name.size());

for (auto i : ordering) {
    newNames.push_back(name[i]);
}

Reconstructing the vectors like this is probably what you want to do if the sorting happens during initialization.




回答2:


It sounds like you want a struct to keep the data together. For example:

struct MyData
{
  wstring name;
  int x;
  int y;
};

...
std::vector<MyData> data;

From there, you'll want a comparison function to do a custom sort that ensures you are sorting from the field you want to sort by:

std::sort(data.begin(), data.end(), compareByName);

bool compareByName(const MyData& lhs, const MyData& rhs)
{
    return lhs.name < rhs.name; // This can be whatever
}


来源:https://stackoverflow.com/questions/45068782/have-many-vectors-sorted-together

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!