Avoid exponential grow of const references and rvalue references in constructor

问题

I am coding some templated classes for a machine learning library, and I'm facing this issue a lot of times. I'm using mostly the policy pattern, where classes receive as template argument policies for different functionalities, for example:

template <class Loss, class Optimizer> class LinearClassifier { ... }

The problem is with the constructors. As the amount of policies (template parameters) grows, the combinations of const references and rvalue references grow exponentially. In the previous example:

LinearClassifier(const Loss& loss, const Optimizer& optimizer) : _loss(loss), _optimizer(optimizer) {}

LinearClassifier(Loss&& loss, const Optimizer& optimizer) : _loss(std::move(loss)), _optimizer(optimizer) {}

LinearClassifier(const Loss& loss, Optimizer&& optimizer) : _loss(loss), _optimizer(std::move(optimizer)) {}

LinearClassifier(Loss&& loss, Optimizer&& optimizer) : _loss(std::move(loss)), _optimizer(std::move(optimizer)) {}

Is there some way to avoid this?

回答1:

Actually, this is the precise reason why perfect forwarding was introduced. Rewrite the constructor as

template <typename L, typename O>
LinearClassifier(L && loss, O && optimizer)
    : _loss(std::forward<L>(loss))
    , _optimizer(std::forward<O>(optimizer))
{}

But it will probably be much simpler to do what Ilya Popov suggests in his answer. To be honest, I usually do it this way, since moves are intended to be cheap and one more move does not change things dramatically.

As Howard Hinnant has told, my method can be SFINAE-unfriendly, since now LinearClassifier accepts any pair of types in constructor. Barry's answer shows how to deal with it.

回答2:

This is exactly the use case for "pass by value and move" technique. Although slighly less efficient than lvalue/rvalue overloads, it not too bad (one extra move) and saves you the hassle.

LinearClassifier(Loss loss, Optimizer optimizer) 
    : _loss(std::move(loss)), _optimizer(std::move(optimizer)) {}

In the case of lvalue argument, there will be one copy and one move, in the case of rvalue argument, there will be two moves (provided that you classes Loss and Optimizer implement move constructors).

Update: In general, perfect forwarding solution is more efficient. On the other hand, this solution avoids templated constructors which are not always desirable, because it will accept arguments of any type when not constrained with SFINAE and lead to hard errors inside the constructor if arguments are not compatible. In other words, unconstrained templated constructors are not SFINAE-friendly. See Barry's answer for a constrained template constructor which avoids this problem.

Another potential problem of a templated constructor is the need to place it in a header file.

Update 2: Herb Sutter talks about this problem in his CppCon 2014 talk "Back to the Basics" starting at 1:03:48. He discusses pass by value first, then overloading on rvalue-ref, then perfect forwarding at 1:15:22 including constraining. And finally he talks about constructors as the only good use case for passing by value at 1:25:50.

回答3:

For the sake of completeness, the optimal 2-argument constructor would take two forwarding references and use SFINAE to ensure that they're the correct types. We can introduce the following alias:

template <class T, class U>
using decays_to = std::is_convertible<std::decay_t<T>*, U*>;

And then:

template <class L, class O,
          class = std::enable_if_t<decays_to<L, Loss>::value &&
                                   decays_to<O, Optimizer>::value>>
LinearClassifier(L&& loss, O&& optimizer)
: _loss(std::forward<L>(loss))
, _optimizer(std::forward<O>(optimizer))
{ }

This ensures that we only accept arguments that are of type Loss and Optimizer (or are derived from them). Unfortunately, it is quite a mouthful to write and is very distracting from the original intent. This is pretty difficult to get right - but if performance matters, then it matters, and this is really the only way to go.

But if it doesn't matter, and if Loss and Optimizer are cheap to move (or, better still, performance for this constructor is completely irrelevant), prefer Ilya Popov's solution:

LinearClassifier(Loss loss, Optimizer optimizer)
: _loss(std::move(loss))
, _optimizer(std::move(optimizer))
{ }

回答4:

How far down the rabbit hole do you want to go?

I'm aware of 4 decent ways to approach this problem. You should generally use the earlier ones if you match their preconditions, as each later one increases significantly in complexity.

For the most part, either move is so cheap doing it twice is free, or move is copy.

If move is copy, and copy is non-free, take the parameter by const&. If not, take it by value.

This will behave basically optimally, and makes your code far easier to understand.

LinearClassifier(Loss loss, Optimizer const& optimizer)
  : _loss(std::move(loss))
  , _optimizer(optimizer)
{}

for a cheap-to-move Loss and move-is-copy optimizer.

This does 1 extra move over the "optimal" perfect forwarding below (note: perfect forwarding is not optimal) per value parameter in all cases. So long as move is cheap, this is the best solution, because it generates clean error messages, allows {} based construction, and is far easier to read than any of the other solutions.

Consider using this solution.

If move is cheaper than copy yet non-free, one approach is perfect forwarding based: Either:

template<class L, class O    >
LinearClassifier(L&& loss, O&& optimizer)
  : _loss(std::forward<L>(loss))
  , _optimizer(std::forward<O>(optimizer))
{}

Or the more complex and more overload-friendly:

template<class L, class O,
  std::enable_if_t<
    std::is_same<std::decay_t<L>, Loss>{}
    && std::is_same<std::decay_t<O>, Optimizer>{}
  , int> * = nullptr
>
LinearClassifier(L&& loss, O&& optimizer)
  : _loss(std::forward<L>(loss))
  , _optimizer(std::forward<O>(optimizer))
{}

this costs you the ability to do {} based construction of your arguments. Also, up to exponential number of constructors can be generated by the above code if they are called (hopefully they will be inlined).

You can drop the std::enable_if_t clause at the cost of SFINAE failure; basically, the wrong overload of your constructor can be picked if you aren't careful with that std::enable_if_t clause. If you have constructor overloads with the same number of arguments, or care about early-failure, then you want the std::enable_if_t one. Otherwise, use the simpler one.

This solution is usually considered "most optimal". It is accepably optimal, but it is not most optimal.

The next step is to use emplace construction with tuples.

private:
template<std::size_t...LIs, std::size_t...OIs, class...Ls, class...Os>
LinearClassifier(std::piecewise_construct_t,
  std::index_sequence<LIs...>, std::tuple<Ls...>&& ls,
  std::index_sequence<OIs...>, std::tuple<Os...>&& os
)
  : _loss(std::get<LIs>(std::move(ls))...)
  , _optimizer(std::get<OIs>(std::move(os))...)
{}
public:
template<class...Ls, class...Os>
LinearClassifier(std::piecewise_construct_t,
  std::tuple<Ls...> ls,
  std::tuple<Os...> os
):
  LinearClassifier(std::piecewise_construct_t{},
    std::index_sequence_for<Ls...>{}, std::move(ls),
    std::index_sequence_for<Os...>{}, std::move(os)
  )
{}

where we defer construction until inside the LinearClassifier. This allows you to have non-copy/moveable objects in your object, and is arguably maximally efficient.

To see how this works, example now piecewise_construct works with std::pair. You pass piecewise construct first, then forward_as_tuple the arguments to construct each element afterwards (including a copy or move ctor).

By directly constructing objects, we can eliminate a move or a copy per object compared to the perfect-forwarding solution above. It also lets you forward a copy or a move if required.

A final cute technique is to type-erase construction. Practically, this requires something like std::experimental::optional<T> to be available, and might make the class a bit larger.

This is not faster than the piecewise construction one. It does abstract the work that the emplace construction one does, making it simpler on a per-use basis, and it permits you to split ctor body from the header file. But there is a small amount of overhead, in both runtime and space.

There is a bunch of boilerplate you need to start with. This generates a template class that represents the concept of "constructing an object, later, at a place someone else will tell me."

struct delayed_emplace_t {};
template<class T>
struct delayed_construct {
  std::function< void(std::experimental::optional<T>&) > ctor;
  delayed_construct(delayed_construct const&)=delete; // class is single-use
  delayed_construct(delayed_construct &&)=default;
  delayed_construct():
    ctor([](auto&op){op.emplace();})
  {}
  template<class T, class...Ts,
    std::enable_if_t<
      sizeof...(Ts)!=0
      || !std::is_same<std::decay_t<T>, delayed_construct>{}
    ,int>* = nullptr
  >
  delayed_construct(T&&t, Ts&&...ts):
    delayed_construct( delayed_emplace_t{}, std::forward<T>(t), std::forward<Ts>(ts)... )
  {}
  template<class T, class...Ts>
  delayed_construct(delayed_emplace_t, T&&t, Ts&&...ts):
    ctor([tup = std::forward_as_tuple(std::forward<T>(t), std::forward<Ts>(ts)...)]( auto& op ) mutable {
      ctor_helper(op, std::make_index_sequence<sizeof...(Ts)+1>{}, std::move(tup));
    })
  template<std::size_t...Is, class...Ts>
  static void ctor_helper(std::experimental::optional<T>& op, std::index_sequence<Is...>, std::tuple<Ts...>&& tup) {
    op.emplace( std::get<Is>(std::move(tup))... );
  }
  void operator()(std::experimental::optional<T>& target) {
    ctor(target);
    ctor = {};
  }
  explicit operator bool() const { return !!ctor; }
};

where we type-erase the action of constructing an optional from arbitrary arguments.

LinearClassifier( delayed_construct<Loss> loss, delayed_construct<Optimizer> optimizer ) {
  loss(_loss);
  optimizer(_optimizer);
}

where _loss are std::experimental::optional<Loss>. To remove the optionality of _loss you have to use std::aligned_storage_t<sizeof(Loss), alignof(Loss)> and be very careful about writing a ctor to handle exceptions and manually destroy things etc. It is a headache.

Some nice things about this last pattern is that the body of the ctor can move out of the header, and at most a linear amount of code is generated instead of an exponential amount of template constructors.

This solution is marginally less efficient than the placement construct version, as not all compilers will be able to inline the std::function use. But it also permits storing non-movable objects.

Code not tested, so there are probably typos.

In c++17 with guaranteed elision, the optional part of the delayed ctor becomes obsolete. Any function returning a T is all you need for a delayed ctor of T.

来源：https://stackoverflow.com/questions/36868442/avoid-exponential-grow-of-const-references-and-rvalue-references-in-constructor

标签

c++

c++11

rvalue-reference

const-reference