STL containers speed vs. arrays

I just started working on a scientific project where speed really matters (HPC). I'm currently designing the data structes. The core of the project is a 3D-Grid of double values, in order to solve a partial differenital equation.

Since speed here is a probably bigger concern then simplicity of the code, I'd like to know how the STL performs compared to usual C-style arrays. In my case, since it's a 3D-grid, I was thinking of a) a one dimensional vector with linear indexing b) a vector of 3 vectors or c) a one dimensional c-style array or d) a three dimensional c-style array.

I looked up older questions, but I only found questions concering construction/destruction (which is not important here, as the datastructures are only created once at program start - fast indexing and computations on it are important) or comparison of different STL containers.

Thanks for help

It's hard (or even impossible) to say in advance what the relative performances will be. Generally, on modern machines, using a flat vector, and calculating the indexes into it, will outperform a vector of vector of vector; on older machines, the reverse was true. (On modern machines, multiplication is usually cheaper than the page misses due to poor locality. On older machines, multiplication was expensive, and there weren't caches, so locality didn't make a difference.)

Also, depending on the machine and the library implementation, using an std::vector for this flat vector may be more expensive than using a simple pointer to the memory (or it may not be—you can't know in advance). I'd still go for the vector first, wrap everything carefully in a controlling class, and if there were still performance problems, switch to the pointer.

The memory layout of the 1D and 3D arrays is going to be the same, and similar to the memory layout of a linear std::vector. I would take that approach: a unidimensional vector and an inlined function to map to the appropriate location.

The vector of vectors, on the other hand has a completely different layout, as the data in the vector is not stored inside the object but dynamically allocated and referenced. This means that two different rows in a std::vector<std::vector<int>> might be in completely unrelated locations in memory.

A vector will do internally what you need to do manually in order to handle an individually sized raw array, therefore as long as they are used properly the vectors should perform the same as raw arrays performing the same task.

For example the single vector should perform the same as the single dimensional c-array, and the vector of vectors should perform approximately the same as if you used a raw array of pointers, each element of which points into an array.

However if the 3d array has uniform dimension sizes (i.e., not ragged arrays) then vectors pay an extra cost to individually manage their sizes (e.g., they have to store their sizes individually). If any of the dimension sizes are known at compile time then you'd be better off using the 'STL' container 'std::array, because it won't have that unnecessary overhead. You can even have multi-dimentionalstd::arrays`:

template<typename T, int Planes, int Rows, Cols>
using Matrix = std::array<std::array<std::array<T,Cols>,Rows>,Planes>;

Although not strictly required this should be the same as T arr[Planes][Rows][Cols];, but without the problems of raw c-arrays.

A widely used in HPC dynamically allocated static (in terms of unchangeable dimensions after allocation) array object is the combination of a flat array and a dope vector. Basically the idea is to allocate a big flat chunk of memory and then to build a tree of pointers into it. For 2D arrays the tree is just a simple linear list of pointers to the beginning of each row. For 3D arrays the tree has two levels and each of the second level elements points to a row from the 2D slice that corresponds to the first level. Placing the dope vector tree at the beginning of the allocated memory allows you to directly apply the [] indexing operator, e.g. A[i][j][k], but some care has to be taken as &A[i] would not be the beginning of the i-th 2D slice.

One problem of this approach is that the dope vector tree could grow very large in size. For example, the size of this data structure for a 1000x1000x1000 array on a 64-bit machine is 1000x1000x8 bytes or almost 8 MiB. This could take precious cache space for certain data access patterns.

来源：https://stackoverflow.com/questions/14988617/stl-containers-speed-vs-arrays

标签

c++

arrays

performance

stl

hpc