I doubt it can be done portably, but are there any solutions out there? I think it could be done by creating an alternate stack and reseting SP,BP, and IP on function entry
I dont think there are many full-blown, clean implementations in C++. One try that I like is Adam Dunkels' protothread library.
See also Protothreads: simplifying event-driven programming of memory-constrained embedded systems in the ACM Digital Library and discussion in Wikipedia topic Protothread,