I doubt it can be done portably, but are there any solutions out there? I think it could be done by creating an alternate stack and reseting SP,BP, and IP on function entry
There's no easy way to implement coroutine. Because coroutine itself is out of C/C++'s stack abstraction just like thread. So it cannot be supported without language level changes to support.
Currently(C++11), all existing C++ coroutine implementations are all based on assembly level hacking which is hard to be safe and reliable crossing over platforms. To be reliable it needs to be standard, and handled by compilers rather than hacking.
There's a standard proposal - N3708 for this. Check it out if you're interested.