stackless coroutines (C++20) do code transformation (state machine)
stackless in this case means, that the application stack is not used to store local variables (for instance variables in your algorithm)
otherwise the local variables of the stackless coroutine would be overwritten by invocations of ordinary functions after suspending the stackless coroutine
stackless coroutines do need memory to store local variables too, especially if the coroutine gets suspended the local variables need to be preserved
for this purpose stackless coroutines allocate and use a so-called activation record (equivalent to a stack frame)
suspending from a deep call stack is only possible if all functions in between are stackless coroutines too (viral; otherwise you would get a corrupted stack)
some clang developers are sceptical that the Heap Allocation eLision Optimization (HALO) can always be applied
stackful coroutines
in its essence a stackful coroutine simply switches stack and instruction pointer
allocate a side-stack that works like a ordinary stack (storing local variables, advancing the stack pointer for called functions)
the side-stack needs to be allocated only once (can also be pooled) and all subsequent function calls are fast (because only advancing the stack pointer)
each stackless coroutines requires its own activation record -> called in a deep call chain a lot activation records have to be created/allocated
stackful coroutines allow to suspend from a deep call chain while the functions in between can be ordinary functions (not viral)
a stackful coroutine can outlive its caller/creator
one version of the skynet benchmarks spawns 1 million stackful coroutines and shows that stackful coroutines are very efficient (outperforming version using threads)
a version of the skynet benchmark using stackless coroutiens was not implemented yet
boost.context represents the thread's primary stack as a stackful coroutine/fiber - even on ARM
boost.context supports on demand growing stacks (GCC split stacks)