Looking at this question: Why does a C/C++ compiler need know the size of an array at compile time ? it came to me that compiler implementers should have had some times to g
If it were to be implemented in VC++, I would assume the compiler team would use some variant of _alloca(size)
. And I think the cost is equivalent to using variables with greater than 8-byte alignment on the stack (such as __m128
); the compiler has to store the original stack pointer somewhere, and aligning the stack requires an extra register to store the unaligned stack.
So the overhead is basically an extra indirection (you have to store the address of VLA somewhere) and register pressure due to storing the original stack range somewhere as well.
How VLAs are implemented in gcc / VC++ ?
AFAIK VC++ doesn't implement VLA. It's a C++ compiler and it supports only C89 (no VLA, no restrict). I don't know how gcc implements VLAs but the fastest possible way is to store the pointer to the VLA and its size in the static portion of the stack-frame. This way you can access one of the VLAs with performance of a constant-sized array (it's the last VLA if the stack grows downwards like in x86 (dereference [stack pointer + index*element size + the size of last temporary pushes]), and the first VLA if it grows upwards (dereference [stackframe pointer + offset from stackframe + index*element size])). All the other VLAs will need one more indirection to get their base address from the static portion of the stack.
[ Edit: Also when using VLA the compiler can't omit stack-frame-base pointer, which is redundant otherwise, because all the offsets from the stack pointer can be calculated during compile time. So you have one less free register. — end edit ]
Is the cost really that impressive ?
Not really. Moreover, if you don't use it, you don't pay for it.
[ Edit: Probably a more correct answer would be: Compared to what? Compared to a heap allocated vector, the access time will be the same but the allocation and deallocation will be faster. — end edit ]