g++: How RVO works in case that multiple translation units are involved

问题

Firstly please take a look at the following code, which consists of 2 translation units.

--- foo.h ---

class Foo
{
public:
    Foo();
    Foo(const Foo& rhs);
    void print() const;
private:
    std::string str_;
};

Foo getFoo();

--- foo.cpp ---
#include <iostream>

Foo::Foo() : str_("hello")
{
    std::cout << "Default Ctor" << std::endl;
}

Foo::Foo(const Foo& rhs) : str_(rhs.str_)
{
    std::cout << "Copy Ctor" << std::endl;
}

void Foo:print() const
{
    std::cout << "print [" << str_ << "]" << std:endl;
}

Foo getFoo()
{
    return Foo(); // Expecting RVO
}

--- main.cpp ---
#include "foo.h"

int main()
{
    Foo foo = getFoo();
    foo.print();
}

Please be sure that foo.cpp and main.cpp are different translation units. So as per my understanding, we can say that there is no implementation details of getFoo() available in the translation unit main.o (main.cpp).

However, if we compile and execute the above, I could not see the "Copy Ctor" string which indicates that RVO works here.

It would be really appreciated if anyone of you kindly let me know how this can be achieved even if the implementation details of 'getFoo()' is not exposed to the translation unit main.o?

I conducted the above experiment by using GCC (g++) 4.4.6.

回答1:

The compiler simply has to work consistently.

In other words, the compiler has to look solely at a return type, and based on that type, decide how a function returning an object of that type will return the value.

At least in a typical case, that decision is fairly trivial. It sets aside a register (or possibly two) to use for return values (e.g., on an Intel/AMD x86/x64 that'll normally be EAX or RAX). Any type small enough to fit into that will be returned there. For any type too large to fit there, the function will receive a hidden pointer/reference parameter that tells it where to deposit the return result. Note that this much applies without RVO/NRVO being involved at all -- in fact, it applies equally to C code that returns a struct as it does to C++ returning a class object. Although returning a struct probably isn't quite as common in C as in C++, it's still allowed, and the compiler has to be able to compile code that does it.

There are really two separate (possible) copies that can be eliminated. One is that the compiler may allocate space on the stack for a local holding what will be the return value, then copy from there to where the pointer refers during the return.

The second is a possible copy from that return address into some other location where the value really needs to end up.

The first gets eliminated inside the function itself, but has no effect on its external interface. It ultimately puts the data wherever the hidden pointer tells it to -- the only question is whether it creates a local copy first, or always works directly with the return point. Obviously with [N]RVO, it always works directly.

The second possible copy is from that (potential) temporary into wherever the value really needs to end up. This is eliminated by optimizing the calling sequence, not the function itself -- i.e., giving the function a pointer to the final destination for that return value, rather than to some temporary location, from which the compiler will then copy the value into its destination.

回答2:

main doesn't need the implementation details of getFoo for RVO to occur. It simply expects the return value to be in some register after getFoo exits.

getFoo has two options for this - create an object in its scope and then copy (or move it) to the return register, or create the object directly in that register. Which is what happens.

It's not telling main to look anywhere else, nor does it need to. It just uses the return register directly.

回答3:

(N)RVO is unrelated to the translation units. The term is commonly used to refer to two different copy elisions that can be applied one inside the function (from a local variable to the returned value) and by the caller (from the returned value to a local variable), and they should be discussed separately.

Proper RVO

This is performed strictly inside a function, consider:

T foo() {
   T local;
   // operate on local
   return local;
}

Conceptually there are two objects, local and the returned object. The compiler can locally analyze the function and determine that the lifetime of both objects is bound: local only lives to serve as the source of a copy to the returned value. The compiler can then bind both variables in a single variable and use it.

Copy elision in the caller side

In the caller side, consider T x = foo();. Again there are two object, the returned object from foo() and x. And again the compiler can determine that the lifetimes are bound and place both objects on the same location.

Further read:

Value semantics: NRVO
Value semantics: Copy elision

来源：https://stackoverflow.com/questions/11615231/g-how-rvo-works-in-case-that-multiple-translation-units-are-involved

标签

c++

g++

rvo