C/C++ performance of static arrays vs dynamic arrays

泪湿孤枕 提交于 2020-05-09 18:00:24

问题


When performance is essential to an application, should consideration be given whether to declare an array on the stack vs the heap? Allow me to outline why this question has come to mind.

Since arrays in C/C++ are not objects and decay to pointers, the compiler uses the provided index to perform pointer arithmetic to access elements. My understanding is that this procedure differs from a statically declared array to a dynamically declared array when going past the first dimension.

If I were to declare an array on the stack as follows;

  int array[2][3] = { 0, 1, 2, 3, 4, 5 }
  //In memory        { row1 } { row2 }

This array would be stored in Row Major format in memory since it is stored in a contiguous block of memory. This means when I try to access an element in the array, the compiler must perform some addition and multiplication in order to ascertain the correct location.

So if I were to do the following

  int x = array[1][2]; // x = 5

The compiler would then use this formula where:

i = row index j = column index n = size of a single row (here n = 2)
array = pointer to first element

  *(array + (i*n) + j)
  *(array + (1*2) + 2)  

This means if I were to loop over this array to access each of its elements, an additional multiplication step is performed for each access by index.

Now, in an array declared on the heap, the paradigm is different and requires a multi stage solution. Note: I could also use the C++ new operator here, but I believe there is no difference in how the data is represented.

  int ** array;
  int rowSize = 2;
  // Create a 2 by 3 2d array on the heap
  array = malloc(2 * sizeof(int*));
  for (int i = 0; i < 2; i++) {
      array[i] = malloc(3 * sizeof(int));
  }

  // Populating the array
  int number = 0;
  for (int i = 0; i < 2; i++) {
      for (int j = 0l j < 3; j++) {
          array[i][j] = number++;
      }
  }

Since the array is now dynamic, its representation is a one dimensional array of one dimensional arrays. I will try to draw an ascii picture...

              int *        int int int
int ** array-> [0]          0   1   2
               [1]          3   4   5

This would imply that multiplication is no longer involved right? If I were to do the following

int x = array[1][1];

This would then perform indirection/pointer arithmetic on array[1] to access a pointer to the second row and then perform this once again to access the second element. Am I correct in saying this?

Now that there is some context, back to the question. If I am writing code for an application that requires crisp performance, like a game which has around 0.016 seconds to render a frame, should I think twice about using an array on the stack vs the heap? Now I realize there is a one time cost for using malloc or the new operator, but at a certain point (just like Big O analysis) when the data set becomes large, would one be better off iterating through a dynamic array to avoid row major indexing?


回答1:


These will apply to "plain" C (not C++).

First let's clear some terminology

"static" is a keyword in C which will drastically change the way your variable is allocated / accessed if it is applied on variables declared within functions.

There are 3 places (regarding C) where a variable (including arrays) may sit:

  • Stack: these are function local variables without static.
  • Data section: space is allocated for these when the program starts. These are any global variables (be it static or not, there the keyword relates to visibility), and any function local variables declared static.
  • Heap: dynamically allocated memory (malloc() & free()) referred by a pointer. You access this data only through pointers.

Now let's see how one dimensional arrays are accessed

If you access an array with a constant index (may be #defined, but not const in plain C), this index can be calculated by the compiler. If you have a true array in the Data section, it will be accessed without any indirection. If you have a pointer (Heap) or an array on the Stack, an indirection is always necessary. So arrays in the Data section with this type of access may be a very little bit faster. But this is not a very useful thing which would turn the world.

If you access an array with an index variable, it essentially always decays to a pointer since the index may change (for example increment in a for loop). The generated code will likely be very similar or even identical for all types here.

Bring in more dimensions

If you declare a two or more dimensional array, and access it partially or fully by constants, an intelligent compiler may well optimize these constants out as above.

If you access by indices, note that the memory is linear. If the later dimensions of a true array are not a multiple of 2, the compiler will need to generate multiplications. For example in the array int arr[4][12]; the second dimension is 12. If you now access it as arr[i][j] where i and j are index variables, the linear memory has to be indexed as 12 * i + j. So the compiler has to generate code to multiply with a constant here. The complexity depends on how "far" the constant is from a power of 2. Here the resulting code will likely look somewhat like calculating (i<<3) + (i<<2) + j to access the element in the array.

If you build up the two dimensional "array" from pointers, the size of the dimensions do not matter since there are reference pointers in your structure. Here if you can write arr[i][j], that implies you declared it as for example int* arr[4], and then malloc()ed four chunks of memory of 12 ints each into it. Note that your four pointers (which the compiler now can use as base) also consume memory which wasn't taken if it was a true array. Also note that here the generated code will contain a double indirection: First the code loads a pointer by i from arr, then it will load an int from that pointer by j.

If the lengths are "far" from powers of 2 (so complex "multiply with constant" codes would have to be generated to access the elements) then using pointers may generate faster access codes.

As James Kanze mentioned in his answer, in some circumstances the compiler may be able to optimize access for true multi-dimensional arrays. This kind of optimization is impossible for arrays composed from pointers as the "array" is actually not a linear chunk of memory that case.

Locality matters

If you are developing for usual desktop / mobile architectures (Intel / ARM 32 / 64 bit processors) locality also matters. That is what is likely sitting in the cache. If your variables are already in the cache for some reason, they will be accessed faster.

In the term of locality Stack is always the winner since the Stack is so frequently used that it is very likely to always sit in the cache. So small arrays are best put in there.

Using true multi-dimensional arrays instead of composing one from pointers may also help on this ground since a true array is always a linear chunk of memory, so it usually might need fewer blocks of cache to load in. A scattered pointer composition (that is if using separately malloc()ed chunks) to the contrary might need more cache blocks, and may rise cache line conflicts depending on how the chunks physically ended up on the heap.




回答2:


The usual way of implementing a 2 dimensional array in C++ would be to wrap it in a class, using std::vector<int>, and have class accessors which calculate the index. However:

Any questions concerning optimization can only be answered by measuring, and even then, they are only valid for the compiler you are using, on the machine on which you do the measurements.

If you write:

int array[2][3] = { ... };

and then something like:

for ( int i = 0; i != 2; ++ i ) {
    for ( int j = 0; j != 3; ++ j ) {
        //  do something with array[i][j]...
    }
}

It's hard to imagine a compiler which doesn't actually generate something along the lines of:

for ( int* p = array, p != array + whatever; ++ p ) {
    //  do something with *p
}

This is one of the most fundamental optimizations around, and has been for at least 30 years.

If you dynamically allocate as you propose, the compiler will not be able to apply this optimization. And even for a single access: the matrix has poorer locality, and requires more memory accesses, so would likely be less performant.

If you're in C++, you would normally write a Matrix class, using std::vector<int> for the memory, and calculating the indexes explicitly using multiplication. (The improved locality will probably result in better performance, despite the multiplication.) This could make it more difficult for the compiler to do the above optimization, but if this turns out to be an issue, you can always provide specialized iterators for handling this one particular case. You end up with more readable and more flexible code (e.g. the dimensions don't have to be constant), at little or no loss of performance.




回答3:


As to which choice provides better performance, then the answer will largely depend on your specific circumstances. The only way to know if one way is better or if they are roughly equivalent is to measure the performance of your application.

Some things that would be a factor are: how often you do it, the actual size of the arrays/data, how much memory your system has, and how well your system manages memory.

If you have the luxury of being able to choose between the two choices, it must mean the sizes are already nailed up. Then, you do not need the multiple allocation scheme that you illustrated. You can perform a single dynamic allocation of your 2D array. In C:

int (*array)[COLUMNS];
array = malloc(ROWS * sizeof(*array));

In C++:

std::vector<std::array<int, COLUMNS>> array(ROWS);

As long as the COLUMNS is nailed down, you can perform a single allocation to obtain your 2D array. If neither are nailed down, then you don't really have the choice of using a static array anyway.




回答4:


Often there is a trade off between memory consumption and speed. Empirically, I have witnessed that creating array on stack is faster than allocation on heap. As the array size increases this becomes more apparent.

You can always decrease the memory consumption. For example you can use short or char instead of int etc.

As the array size increases, especially with the use of realloc, there might be a lot more page replacement (up and down) to maintain the contiguous location of items.

You should also consider that there is a lower limit for the size of the things you can store in stack, for heap this limit is higher but as I told with the cost of performance.




回答5:


Stalk memory allocation offers quicker access of data than the Heap. The CPU would look for the address in the cache if it does not have it, if it does not find the address in the cache then it would look up in the main memory. Stalk is a preferred location after cache.



来源:https://stackoverflow.com/questions/17775066/c-c-performance-of-static-arrays-vs-dynamic-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!