问题
All,
I'm writing some performance sensitive code, including a 3d vector class that will be doing lots of cross-products. As a long-time C++ programmer, I know all about the evils of macros and the various benefits of inline functions. I've long been under the impression that inline functions should be approximately the same speed as macros. However, in performance testing macro vs inline functions, I've come to an interesting discovery that I hope is the result of me making a stupid mistake somewhere: the macro version of my function appears to be over 8 times as fast as the inline version!
First, a ridiculously trimmed down version of a simple vector class:
class Vector3d { public: double m_tX, m_tY, m_tZ; Vector3d() : m_tX(0), m_tY(0), m_tZ(0) {} Vector3d(const double &tX, const double &tY, const double &tZ): m_tX(tX), m_tY(tY), m_tZ(tZ) {} static inline void CrossAndAssign ( const Vector3d& cV1, const Vector3d& cV2, Vector3d& cV ) { cV.m_tX = cV1.m_tY * cV2.m_tZ - cV1.m_tZ * cV2.m_tY; cV.m_tY = cV1.m_tZ * cV2.m_tX - cV1.m_tX * cV2.m_tZ; cV.m_tZ = cV1.m_tX * cV2.m_tY - cV1.m_tY * cV2.m_tX; } #define FastVectorCrossAndAssign(cV1,cV2,cVOut) { \ cVOut.m_tX = cV1.m_tY * cV2.m_tZ - cV1.m_tZ * cV2.m_tY; \ cVOut.m_tY = cV1.m_tZ * cV2.m_tX - cV1.m_tX * cV2.m_tZ; \ cVOut.m_tZ = cV1.m_tX * cV2.m_tY - cV1.m_tY * cV2.m_tX; } };
Here's my sample benchmarking code:
Vector3d right;
Vector3d forward(1.0, 2.2, 3.6);
Vector3d up(3.2, 1.4, 23.6);
clock_t start = clock();
for (long l=0; l < 100000000; l++)
{
Vector3d::CrossAndAssign(forward, up, right); // static inline version
}
clock_t end = clock();
std::cout << end - start << endl;
clock_t start2 = clock();
for (long l=0; l<100000000; l++)
{
FastVectorCrossAndAssign(forward, up, right); // macro version
}
clock_t end2 = clock();
std::cout << end2 - start2 << endl;
The end result: With optimizations turned completely off, the inline version takes 3200 ticks, and the macro version 500 ticks... With optimization turned on (/O2, maximize speed, and other speed tweaks), I can get the inline version down to 1100 ticks, which is better but still not the same.
So I appeal to all of you: is this really true? Have I made a stupid mistake somewhere? Or are inline functions really this much slower -- and if so, why?
回答1:
NOTE: After posting this answer, the original question was edited to remove this problem. I'll leave the answer as it is instructive on several levels.
The loops differ in what they do!
if we manually expand the macro, we get:
for (long l=0; l<100000000; l++)
right.m_tX = forward.m_tY * up.m_tZ - forward.m_tZ * up.m_tY;
right.m_tY = forward.m_tZ * up.m_tX - forward.m_tX * up.m_tZ;
right.m_tZ = forward.m_tX * up.m_tY - forward.m_tY * up.m_tX;
Note the absense of curly brackets. So the compiler sees this as:
for (long l=0; l<100000000; l++)
{
right.m_tX = forward.m_tY * up.m_tZ - forward.m_tZ * up.m_tY;
}
right.m_tY = forward.m_tZ * up.m_tX - forward.m_tX * up.m_tZ;
right.m_tZ = forward.m_tX * up.m_tY - forward.m_tY * up.m_tX;
Which makes it obvious why the second loop is so much faster.
Udpate: This is also a good example of why macros are evil :)
回答2:
please note that if you use the inline keyword, this is only a hint for the compiler. If you turn optimizations off, this might cause the compiler not to inline the function. You should go to Project Settings/C++/Optimization/ and make sure to turn Optimization on. What settings have you used for "Inline Function Expansion"?
回答3:
it also depends optimizations and compiler settings. also look for your compiler's support for an always inline/force inline declaration. inlining is as fast as a macro.
by default, the keyword is a hint -- force inline/always inline (for the most part) returns the control to the programmer of the original intention of the keyword.
finally, gcc (for example) can be directed to inform you when such a function is not inlined as directed.
回答4:
Apart from what Philipp mentioned, if your using MSVC, you can use __forceinline
or the gcc __attrib__
equivalent to correct the probelems with inlining.
However, there is another possible problem lurking, using a macro will cause the parameters of the macro to be re-evaluated at each point, so if you call the macro like so:
FastVectorCrossAndAssign(getForward(), up, right);
it will expand to:
right.m_tX = getForward().m_tY * up.m_tZ - getForward().m_tZ * up.m_tY;
right.m_tY = getForward().m_tZ * up.m_tX - getForward().m_tX * up.m_tZ;
right.m_tZ = getForward().m_tX * up.m_tY - getForward().m_tY * up.m_tX;
not want you want when your concerned with speed :) (especially if getForward()
isn't a lightweight function, or does some incrementing each call, if its an inline function, the compiler might fix the amount of calls, provided it isn't volatile
, that still won't fix everything though)
来源:https://stackoverflow.com/questions/3810221/c-vs2008-performance-of-macros-vs-inline-functions