const reference to temporary vs. return value optimization

问题

I'm aware of the fact that assigning an rvalue to a const lvalue reference extends the temporaries lifetime until the end of the scope. However, it is not clear to me when to use this and when to rely on the return value optimization.

LargeObject lofactory( ... ) {
     // construct a LargeObject in a way that is OK for RVO/NRVO
}

int main() {
    const LargeObject& mylo1 = lofactory( ... ); // using const&
    LargeObject mylo2 = lofactory( ... ); // same as above because of RVO/NRVO ?
}

According to Scot Meyers' More Effective C++ (Item 20) the second method could be optimized by the compiler to construct the object in place (which would be ideal and exactly what one tries to achieve with the const& in the first method).

Are there any generally accepted rules or best practices when to use const& to temporaries and when to rely on RVO/NRVO?
Could there be a situation in which using the const& method is worse than not using it? (I'm thinking for example about C++11 move semantics if LargeObject has those implemented ...)

回答1:

Let's consider the most simple case:

lofactory( ... ).some_method();

In this case one copy from lofactory to caller context is possible – but it can be optimized away by RVO/NRVO.

LargeObject mylo2 ( lofactory( ... ) );

In this case possible copies are:

Return temporary from lofactory to caller context – can be optimized away by RVO/NRVO
Copy-construct mylo2 from temporary – can be optimized away by copy-elision

const LargeObject& mylo1 = lofactory( ... );

In this case, there one copy is still possible:

Return temporary from lofactory to caller context – can be optimized away by RVO/NRVO (too!)

A reference will bind to this temporary.

So,

Are there any generally accepted rules or best practices when to use const& to temporaries and when to rely on RVO/NRVO?

As I said above, even in a case with const&, an unnecesary copy is possible, and it can be optimized away by RVO/NRVO.

If your compiler applies RVO/NVRO in some case, then most likely it will do copy-elision at stage 2 (above). Because in that case, copy-elision is much simpler than NRVO.

But, in the worst case, you will have one copy for the const& case, and two copies when you init the value.

Could there be a situation in which using the const& method is worse than not using it?

I don't think that there are such cases. At least unless your compiler uses strange rules that discriminate const&. (For an example of a similar situation, I noticed that MSVC does not do NVRO for aggregate initialization.)

(I'm thinking for example about C++11 move semantics if LargeObject has those implemented ...)

In C++11, if LargeObject has move semantics, then in the worst case, you will have one move for the const& case, and two moves when you init the value. So, const& is still a little better.

So a good rule would be to always bind temporaries to const& if possible, since it might prevent a copy if the compiler fails to do a copy-elision for some reason?

Without knowing actual context of application, this seems like a good rule.

In C++11 it is possible to bind temporary to rvalue reference - LargeObject&&. So, such temporary can be modified.

By the way, move semantic emulation is available to C++98/03 by different tricks. For instance:

Mojo/Boost.Move
Bjarne Stroustrup describes another trick using small mutable flag inside class. Example code that he mentioned is here.

However, even in presence of move semantic - there are objects which can't be cheaply moved. For instance, 4x4 matrix class with double data[4][4] inside. So, Copy-elision RVO/NRVO are still very important, even in C++11. And by the way, when Copy-elision/RVO/NRVO happens - it is faster than move.

P.S., in real cases, there are some additional things that should be considered:

For instance, if you have function that returns vector, even if Move/RVO/NRVO/Copy-Elision would be applied - it still may be not 100% efficient. For instance, consider following case:

while(/*...*/)
{
    vector<some> v = produce_next(/* ... */); // Move/RVO/NRVO are applied
    // ...
}

It will be more efficient to change code to:

vector<some> v;
while(/*...*/)
{
    v.clear();

    produce_next( v ); // fill v
    // or something like:
    produce_next( back_inserter(v) );
    // ...
}

Because in this case, already allocated memory inside vector can be re-used when v.capacity() is enough, without need to do new allocations inside produce_next on each iteration.

回答2:

If you write your lofactory class like this:

LargeObject lofactory( ... ) {
    // figure out constructor arguments to build a large object
    return { arg1, arg2, arg3 }  //  return statement with a braced-init-list
}

In this case there is no RVO/NRVO, it's direct construction. Section 6.6.3 of the standard says “A return statement with a braced-init-list initializes the object or reference to be returned from the function by copy-list-initialization (8.5.4) from the specified initializer list.”

Then, if you capture your object with