I was trying a question on arrays in InterviewBit. In this question I made an inline function returning the absolute value of an integer. But I was told that my algorithm wa
Your abs performs branching based on a condition. While the built-in variant just removes the sign bit from the integer, most likely using just a couple of instructions. Possible assembly example (taken from here):
cdq
xor eax, edx
sub eax, edx
The cdq copies the sign of the register eax to register edx. For example, if it is a positive number, edx will be zero, otherwise, edx will be 0xFFFFFF which denotes -1. The xor operation with the origin number will change nothing if it is a positive number (any number xor 0 will not change). However, when eax is negative, eax xor 0xFFFFFF yields (not eax). The final step is to subtract edx from eax. Again, if eax is positive, edx is zero, and the final value is still the same. For negative values, (~ eax) – (-1) = –eax which is the value wanted.
As you can see this approach uses only three simple arithmetic instructions and no conditional branching at all.
Edit: After some research it turned out that many built-in implementations of abs use the same approach, return __x >= 0 ? __x : -__x;, and such a pattern is an obvious target for compiler optimization to avoid unnecessary branching.
However, that does not justify the use of custom abs implementation as it violates the DRY principle and no one can guarantee that your implementation is going to be just as good for more sophisticated scenarios and/or unusual platforms. Typically one should think about rewriting some of the library functions only when there is a definite performance problem or some other defect detected in existing implementation.
Edit2: Just switching from int to float shows considerable performance degradation:
float libfoo(float x)
{
return ::std::fabs(x);
}
andps xmm0, xmmword ptr [rip + .LCPI0_0]
And a custom version:
inline float my_fabs(float x)
{
return x>0.0f?x:-x;
}
float myfoo(float x)
{
return my_fabs(x);
}
movaps xmm1, xmmword ptr [rip + .LCPI1_0] # xmm1 = [-0.000000e+00,-0.000000e+00,-0.000000e+00,-0.000000e+00]
xorps xmm1, xmm0
xorps xmm2, xmm2
cmpltss xmm2, xmm0
andps xmm0, xmm2
andnps xmm2, xmm1
orps xmm0, xmm2
online compiler