How does one implement alloca() using inline x86 assembler in languages like D, C, and C++? I want to create a slightly modified version of it, but first I need to know how
What we want to do is something like that:
void* alloca(size_t size) {
<sp> -= size;
return <sp>;
}
In Assembly (Visual Studio 2017, 64bit) it looks like:
;alloca.asm
_TEXT SEGMENT
PUBLIC alloca
alloca PROC
sub rsp, rcx ;<sp> -= size
mov rax, rsp ;return <sp>;
ret
alloca ENDP
_TEXT ENDS
END
Unfortunately our return pointer is the last item on the stack, and we do not want to overwrite it. Additionally we need to take care for the alignment, ie. round size up to multiple of 8. So we have to do this:
;alloca.asm
_TEXT SEGMENT
PUBLIC alloca
alloca PROC
;round up to multiple of 8
mov rax, rcx
mov rbx, 8
xor rdx, rdx
div rbx
sub rbx, rdx
mov rax, rbx
mov rbx, 8
xor rdx, rdx
div rbx
add rcx, rdx
;increase stack pointer
pop rbx
sub rsp, rcx
mov rax, rsp
push rbx
ret
alloca ENDP
_TEXT ENDS
END
If you can't use c99's Variable Length Arrays, you can use a compound literal cast to a void pointer.
#define ALLOCA(sz) ((void*)((char[sz]){0}))
This also works for -ansi (as a gcc extension) and even when it is a function argument;
some_func(&useful_return, ALLOCA(sizeof(struct useless_return)));
The downside is that when compiled as c++, g++>4.6 will give you an error: taking address of temporary array ... clang and icc don't complain though
I recommend the "enter" instruction. Available on 286 and newer processors (may have been available on the 186 as well, I can't remember offhand, but those weren't widely available anyways).
Variable-Length Array in pure ISO C++. Proof-of-Concept implementation.
void foo(unsigned n)
{
cps_alloca<Payload>(n,[](Payload *first,Payload *last)
{
fill(first,last,something);
});
}
template<typename T,unsigned N,typename F>
auto cps_alloca_static(F &&f) -> decltype(f(nullptr,nullptr))
{
T data[N];
return f(&data[0],&data[0]+N);
}
template<typename T,typename F>
auto cps_alloca_dynamic(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
vector<T> data(n);
return f(&data[0],&data[0]+n);
}
template<typename T,typename F>
auto cps_alloca(unsigned n,F &&f) -> decltype(f(nullptr,nullptr))
{
switch(n)
{
case 1: return cps_alloca_static<T,1>(f);
case 2: return cps_alloca_static<T,2>(f);
case 3: return cps_alloca_static<T,3>(f);
case 4: return cps_alloca_static<T,4>(f);
case 0: return f(nullptr,nullptr);
default: return cps_alloca_dynamic<T>(n,f);
}; // mpl::for_each / array / index pack / recursive bsearch / etc variacion
}
LIVE DEMO
cps_alloca on github
For the D programming language, the source code for alloca() comes with the download. How it works is fairly well commented. For dmd1, it's in /dmd/src/phobos/internal/alloca.d. For dmd2, it's in /dmd/src/druntime/src/compiler/dmd/alloca.d.