Which is the fastest? A boost::multi_array or a std::vector?

|▌冷眼眸甩不掉的悲伤 提交于 2021-01-28 06:51:28

问题


Which is the fastest? A boost::multi_array or a std::vector? I will have (not constant) 17.179.869 elements stored in 3 dimensions which will need to be accessed inside a for loop very rapidly and very often. What would be the most performing? An std::vector or a boost::multi_array?

(I don't expect it to be done within a second, but I would like to have it as efficient as possible because a nanosecond difference can save a lot of time.)


回答1:


Best advice is to benchmark it by yourself.

In any case, since you seem to have constant size there are other solutions:

  • plain C arrays (eg. int data[X][Y][Z])
  • plain one dimensional C array in which you compute indices by yourself, eg X*W*H + Y*W + Z, can be handy in some situations
  • std::array, which is basically a C++ array with some synctactic sugar taken from STL collections
  • std::vector, which I guess is the first solution that can be tried
  • boost::multi_array, which is meant to support N dimensional arrays so it can be overkill for your purpose but probably has a better locality of data compared to a vector.



回答2:


Those library vector classes are designed to be easy to use and relatively fail-safe. They are as fast as they can be within their design, but nothing can beat doing it yourself (except maybe hand-coded assembly). For the size you're talking about (2e10 elements), I would be more concerned with efficiency than with user-friendliness. If your innermost loop does very little computation per element, you're going to find the indexing calculations dominant, which suggests doing some unrolling and pointer-stepping. (Maybe you can count on the compiler to do some unrolling, but I don't care for maybes.)




回答3:


The only way to know for sure is to try both and profile the code. However as a bunch of ideas, this is what I think you'll find.

  1. For a large number of elements that you are dealing with (2e10+) the access to elements is not going to be as significant as the cache pressure to load those elements into the cpu cache. The prefetcher is going to be sitting there trying to load those elements, which is going to take a large proportion of the time.
  2. Accessing 2(or 3D) non-contiguous C arrays means the CPU has to go around fetching things from different parts of memory. boost::multi_array solves that somewhat by behind the scenes storing it as a contiguous block; but it has it's own overheads for doing so. As @Jack said, plain 1D arrays with indices are best, and even then you can do things to ensure the indexing is minimal.(eg memoization)
  3. The work you do within the loop is going to affect your timings significantly. The branch predictor is going to be the biggest contributor. If it's a simple math operation, no if/else statements you'll likely get the best performance, and the compiler will likely optimise it to SSE instructions. If you have composite types (rather than int/float/char) then you're going to have to lay them out right to optimise access.
  4. I would almost suggest, try both, then come back with a new SO question that has your loop written and ask how to optimise that part. Almost always, the compiler can be given hints to ensure it knows your intentions.

At the end of the day, try it and see



来源:https://stackoverflow.com/questions/17887532/which-is-the-fastest-a-boostmulti-array-or-a-stdvector

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!