noexcept, stack unwinding and performance

感情迁移 提交于 2019-11-28 05:14:45

There's "no" overhead and then there's no overhead. You can think of the compiler in different ways:

  • It generates a program which performs certain actions.
  • It generates a program satisfying certain constraints.

The TR says there's no overhead in the table-driven appraoch because no action needs to be taken as long as a throw doesn't occur. The non-exceptional execution path goes straight forward.

However, to make the tables work, the non-exceptional code still needs additional constraints. Each object needs to be fully initialized before any exception could lead to its destruction, limiting the reordering of instructions (e.g. from an inlined constructor) across potentially throwing calls. Likewise, an object must be completely destroyed before any possible subsequent exception.

Table-based unwinding only works with functions following the ABI calling conventions, with stack frames. Without the possibility of an exception, the compiler may have been free to ignore the ABI and omit the frame.

Space overhead, a.k.a. bloat, in the form of tables and separate exceptional code paths, might not affect execution time, but it can still affect time taken to download the program and load it into RAM.

It's all relative, but noexcept cuts the compiler some slack.

The difference between noexcept and throw() is that in case of throw() the exception stack is still unwound and destructors are called, so implementation has to keep track of the stack (see 15.5.2 The std::unexpected() function in the standard).

On the contrary, std::terminate() does not require the stack to be unwound (15.5.1 states that it is implementation-defined whether or not the stack is unwound before std::terminate() is called).

GCC seem to really not unwind the stack for noexcept: Demo
While clang still unwinds: Demo

(You can comment f_noexcept() and uncomment f_emptythrow() in the demos to see that for throw() both GCC and clang unwind the stack)

Take the following example:

#include <stdio.h>

int fun(int a) {

  int res;
  try
  {
    res = a *11;
    if(res == 33)
       throw 20;
  }
  catch (int e)
  {
    char *msg = "error";
    printf(msg);
  }
  return res;
}

int main(int argc, char** argv) {
  return fun(argc);
}

the data passed as input isn't foresee-able from a compiler's perspective and thus no assumption can be made even with -O3 optimizations to completely elide the call or the exception system.

In LLVM IR the fun function roughly translates as

define i32 @_Z3funi(i32 %a) #0 {
entry:
  %mul = mul nsw i32 %a, 11 // The actual processing
  %cmp = icmp eq i32 %mul, 33 
  br i1 %cmp, label %if.then, label %try.cont // jump if res == 33 to if.then

if.then:                                          // lots of stuff happen here..
  %exception = tail call i8* @__cxa_allocate_exception(i64 4) #3
  %0 = bitcast i8* %exception to i32*
  store i32 20, i32* %0, align 4, !tbaa !1
  invoke void @__cxa_throw(i8* %exception, i8* bitcast (i8** @_ZTIi to i8*), i8* null) #4
          to label %unreachable unwind label %lpad

lpad:                                             
  %1 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*)
          catch i8* bitcast (i8** @_ZTIi to i8*)
 ... // also here..

invoke.cont:                                      
  ... // and here
  br label %try.cont

try.cont:        // This is where the normal flow should go
  ret i32 %mul

eh.resume:                                        
  resume { i8*, i32 } %1

unreachable:                                    
  unreachable
}

as you can see the codepath, even if straightforward in the event of a normal control flow (no exceptions), now consists of several basic blocks branches in the same function.

It is true that at runtime almost no cost is associated since you pay for what you use (if you don't throw, nothing extra happens), but having multiple branches might hurt your performances as well, e.g.

  • branch prediction becomes harder
  • register pressure might increase substantially
  • [others]

and surely you can't run passthrough-branch optimizations between normal control flow and landing pads/exception entry points.

Exceptions are a complex mechanism and noexcept greatly facilitates a compiler's life even in the even of zero-cost EH.


Edit: in the specific case of the noexcept specifier, if the compiler can't 'prove' that your code doesn't throw, a std::terminate EH is set up (with implementation-dependent details). In both cases (code doesn't throw and/or can't prove that the code doesn't throw) the mechanics involved are simpler and the compiler is less constrained. Anyway you don't really use noexcept for optimization reasons, it's also an important semantic indication.

I just made a benchmark, to measure the performance effect of adding a 'noexcept' specifier, for various test cases: https://github.com/N-Dekker/noexcept_benchmark It has a specific test case that could take advantage of the possibility to skip stack unwinding, with 'noexcept':

void recursive_func(recursion_data& data) noexcept // or no 'noexcept'!
{
  if (--data.number_of_func_calls_to_do > 0)
  {
    noexcept_benchmark::throw_exception_if(data.volatile_false);
    object_class stack_object(data.object_counter);
    recursive_func(data);
  }
}

https://github.com/N-Dekker/noexcept_benchmark/blob/v03/lib/stack_unwinding_test.cpp#L48

Looking at the benchmark results, it appears that both VS2017 x64 and GCC 5.4.0 yield a significant performance gain from adding 'noexcept', in this specific test case.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!