Why does an inline function have lower efficiency than an in-built function?

后端 未结 3 429
挽巷
挽巷 2021-01-03 20:59

I was trying a question on arrays in InterviewBit. In this question I made an inline function returning the absolute value of an integer. But I was told that my algorithm wa

3条回答
  •  青春惊慌失措
    2021-01-03 21:21

    Your abs performs branching based on a condition. While the built-in variant just removes the sign bit from the integer, most likely using just a couple of instructions. Possible assembly example (taken from here):

    cdq
    xor eax, edx
    sub eax, edx
    

    The cdq copies the sign of the register eax to register edx. For example, if it is a positive number, edx will be zero, otherwise, edx will be 0xFFFFFF which denotes -1. The xor operation with the origin number will change nothing if it is a positive number (any number xor 0 will not change). However, when eax is negative, eax xor 0xFFFFFF yields (not eax). The final step is to subtract edx from eax. Again, if eax is positive, edx is zero, and the final value is still the same. For negative values, (~ eax) – (-1) = –eax which is the value wanted.

    As you can see this approach uses only three simple arithmetic instructions and no conditional branching at all.

    Edit: After some research it turned out that many built-in implementations of abs use the same approach, return __x >= 0 ? __x : -__x;, and such a pattern is an obvious target for compiler optimization to avoid unnecessary branching.

    However, that does not justify the use of custom abs implementation as it violates the DRY principle and no one can guarantee that your implementation is going to be just as good for more sophisticated scenarios and/or unusual platforms. Typically one should think about rewriting some of the library functions only when there is a definite performance problem or some other defect detected in existing implementation.

    Edit2: Just switching from int to float shows considerable performance degradation:

    float libfoo(float x)
    {
        return ::std::fabs(x);
    }
    
    andps   xmm0, xmmword ptr [rip + .LCPI0_0]
    

    And a custom version:

    inline float my_fabs(float x)
    {
        return x>0.0f?x:-x;
    }
    
    float myfoo(float x)
    {
        return my_fabs(x);
    }
    
    movaps  xmm1, xmmword ptr [rip + .LCPI1_0] # xmm1 = [-0.000000e+00,-0.000000e+00,-0.000000e+00,-0.000000e+00]
    xorps   xmm1, xmm0
    xorps   xmm2, xmm2
    cmpltss xmm2, xmm0
    andps   xmm0, xmm2
    andnps  xmm2, xmm1
    orps    xmm0, xmm2
    

    online compiler

提交回复
热议问题