How does one implement alloca() using inline x86 assembler in languages like D, C, and C++? I want to create a slightly modified version of it, but first I need to know how
The C and C++ standards don't specify that alloca() has to the use the stack, because alloca() isn't in the C or C++ standards (or POSIX for that matter)¹.
A compiler may also implement alloca() using the heap. For example, the ARM RealView (RVCT) compiler's alloca() uses malloc() to allocate the buffer (referenced on their website here), and also causes the compiler to emit code that frees the buffer when the function returns. This doesn't require playing with the stack pointer, but still requires compiler support.
Microsoft Visual C++ has a _malloca() function that uses the heap if there isn't enough room on the stack, but it requires the caller to use _freea(), unlike _alloca(), which does not need/want explicit freeing.
(With C++ destructors at your disposal, you can obviously do the cleanup without compiler support, but you can't declare local variables inside an arbitrary expression so I don't think you could write an alloca() macro that uses RAII. Then again, apparently you can't use alloca() in some expressions (like function parameters) anyway.)
¹ Yes, it's legal to write an alloca() that simply calls system("/usr/games/nethack").