I have read some questions about returning more than one value such as What is the reason behind having only one return value in C++ and Java?, Returning multiple values fro
Yes, this is sometimes done. If you read the Wikipedia page on x86 calling conventions under cdecl:
There are some variations in the interpretation of cdecl, particularly in how to return values. As a result, x86 programs compiled for different operating system platforms and/or by different compilers can be incompatible, even if they both use the "cdecl" convention and do not call out to the underlying environment. Some compilers return simple data structures with a length of 2 registers or less in the register pair EAX:EDX, and larger structures and class objects requiring special treatment by the exception handler (e.g., a defined constructor, destructor, or assignment) are returned in memory. To pass "in memory", the caller allocates memory and passes a pointer to it as a hidden first parameter; the callee populates the memory and returns the pointer, popping the hidden pointer when returning.
(emphasis mine)
Ultimately, it comes down to calling convention. It's possible for your compiler to optimize your code to use whatever registers it wants, but when your code interacts with other code (like the operating system), it needs to follow the standard calling conventions, which typically uses 1 register for returning values.
Returning in stack isn't necessarily slower, because once the values are available in L1 cache (which the stack often fulfills), accessing them will be very fast.
However in most computer architectures there are at least 2 registers to return values that are twice (or more) as wide as the word size (edx:eax
in x86, rdx:rax
in x86_64, $v0 and $v1 in MIPS (Why MIPS assembler has more that one register for return value?), R0:R3
in ARM1, X0:X7
in ARM64...). The ones that don't have are mostly microcontrollers with only one accumulator or a very limited number of registers.
1"If the type of value returned is too large to fit in r0 to r3, or whose size cannot be determined statically at compile time, then the caller must allocate space for that value at run time, and pass a pointer to that space in r0."
These registers can also be used for returning directly small structs that fits in 2 (or more depending on architecture and ABI) registers or less.
For example with the following code
struct Point
{
int x, y;
};
struct shortPoint
{
short x, y;
};
struct Point3D
{
int x, y, z;
};
Point P1()
{
Point p;
p.x = 1;
p.y = 2;
return p;
}
Point P2()
{
Point p;
p.x = 1;
p.y = 0;
return p;
}
shortPoint P3()
{
shortPoint p;
p.x = 1;
p.y = 0;
return p;
}
Point3D P4()
{
Point3D p;
p.x = 1;
p.y = 2;
p.z = 3;
return p;
}
Clang emits the following instructions for x86_64 as you can see here
P1(): # @P1()
movabs rax, 8589934593
ret
P2(): # @P2()
mov eax, 1
ret
P3(): # @P3()
mov eax, 1
ret
P4(): # @P4()
movabs rax, 8589934593
mov edx, 3
ret
For ARM64:
P1():
mov x0, 1
orr x0, x0, 8589934592
ret
P2():
mov x0, 1
ret
P3():
mov w0, 1
ret
P4():
mov x1, 1
mov x0, 0
sub sp, sp, #16
bfi x0, x1, 0, 32
mov x1, 2
bfi x0, x1, 32, 32
add sp, sp, 16
mov x1, 3
ret
As you can see, no stack operations are involved. You can switch to other compilers to see that the values are mainly returned on registers.
Return data is put on the stack. Returning a struct by copy is literally the same thing as returning multiple values in that all it's data members are put on the stack. If you want multiple return values that is the simplest way. I know in Lua that's exactly how it handles it, just wraps it in a struct. Why it was never implemented, probably because you could just do it with a struct, so why implement a different method? As for C++, it actually does support multiple return values, but it's in the form of a special class, really the same way Java handles multiple return values (tuples) as well. So in the end, it's all the same, either you copy the data raw (non-pointer/non-reference to a struct/object) or just copy a pointer to a collection that stores multiple values.