Say I\'m writing a routine in x86 assembly, like, \"add\" which adds two numbers passed as arguments.
For the most part this is a very simple method:
Sure.
push ebp
mov ebp, esp
mov eax, [ebp+8]
add eax, [ebp+12]
mov esp, ebp
pop ebp
pop ecx ; these two instructions simulate "ret"
jmp ecx
This assumes you have a free register (e.g, ecx). Writing an equivalent that uses "no registers" is possible (after all the x86 is a Turing machine) but is likely to include a lot of convoluted register and stack shuffling.
Most current OSes offer thread-specific storage accessible by one of the segment registers. You could then simulate "ret" this way, safely:
pop gs:preallocated_tls_slot ; pick one
jmp gs:preallocated_tls_slot