A lot of literature talks about using inline functions to \"avoid the overhead of a function call\". However I haven\'t seen quantifiable data. What is the actual overhead o
Modern CPUs are very fast (obviously!). Almost every operation involved with calls and argument passing are full speed instructions (indirect calls might be slightly more expensive, mostly the first time through a loop).
Function call overhead is so small, only loops that call functions can make call overhead relevant.
Therefore, when we talk about (and measure) function call overhead today, we are usually really talking about the overhead of not being able to hoist common subexpressions out of loops. If a function has to do a bunch of (identical) work every time it is called, the compiler would be able to "hoist" it out of the loop and do it once if it was inlined. When not inlined, the code will probably just go ahead and repeat the work, you told it to!
Inlined functions seem impossibly faster not because of call and argument overhead, but because of common subexpressions that can be hoisted out of the function.
Example:
Foo::result_type MakeMeFaster()
{
Foo t = 0;
for (auto i = 0; i < 1000; ++i)
t += CheckOverhead(SomethingUnpredictible());
return t.result();
}
Foo CheckOverhead(int i)
{
auto n = CalculatePi_1000_digits();
return i * n;
}
An optimizer can see through this foolishness and do:
Foo::result_type MakeMeFaster()
{
Foo t;
auto _hidden_optimizer_tmp = CalculatePi_1000_digits();
for (auto i = 0; i < 1000; ++i)
t += SomethingUnpredictible() * _hidden_optimizer_tmp;
return t.result();
}
It seems like call overhead is impossibly reduced because it really has hoised a big chunk of the function out of the loop (the CalculatePi_1000_digits call). The compiler would need to be able to prove that CalculatePi_1000_digits always returns the same result, but good optimizers can do that.