are 2^n exponent calculations really less efficient than bit-shifts?

后端 未结 4 917
长情又很酷
长情又很酷 2020-12-15 21:01

if I do:

int x = 4;
pow(2, x);

Is that really that much less efficient than just doing:

1 << 4

?

相关标签:
4条回答
  • 2020-12-15 21:44

    That depends on the compiler, but in general (when the compiler is not totally braindead) yes, the shift is one CPU instruction, the other is a function call, that involves saving the current state an setting up a stack frame, that requires many instructions.

    0 讨论(0)
  • 2020-12-15 21:51

    Yes. An easy way to show this is to compile the following two functions that do the same thing and then look at the disassembly.

    #include <stdint.h>
    #include <math.h>
    
    uint32_t foo1(uint32_t shftAmt) {
        return pow(2, shftAmt);
    }
    
    uint32_t foo2(uint32_t shftAmt) {
        return (1 << shftAmt);
    }
    

    cc -arch armv7 -O3 -S -o - shift.c (I happen to find ARM asm easier to read but if you want x86 just remove the arch flag)

        _foo1:
    @ BB#0:
        push    {r7, lr}
        vmov    s0, r0
        mov r7, sp
        vcvt.f64.u32    d16, s0
        vmov    r0, r1, d16
        blx _exp2
        vmov    d16, r0, r1
        vcvt.u32.f64    s0, d16
        vmov    r0, s0
        pop {r7, pc}
    
    _foo2:
    @ BB#0:
        movs    r1, #1
        lsl.w   r0, r1, r0
        bx  lr
    

    You can see foo2 only takes 2 instructions vs foo1 which takes several instructions. It has to move the data to the FP HW registers (vmov), convert the integer to a float (vcvt.f64.u32) call the exp function and then convert the answer back to an uint (vcvt.u32.f64) and move it from the FP HW back to the GP registers.

    0 讨论(0)
  • 2020-12-15 21:56

    Yes. Though by how much I can't say. The easiest way to determine that is to benchmark it.

    The pow function uses doubles... At least, if it conforms to the C standard. Even if that function used bitshift when it sees a base of 2, there would still be testing and branching to reach that conclusion, by which time your simple bitshift would be completed. And we haven't even considered the overhead of a function call yet.

    For equivalency, I assume you meant to use 1 << x instead of 1 << 4.

    Perhaps a compiler could optimize both of these, but it's far less likely to optimize a call to pow. If you need the fastest way to compute a power of 2, do it with shifting.

    Update... Since I mentioned it's easy to benchmark, I decided to do just that. I happen to have Windows and Visual C++ handy so I used that. Results will vary. My program:

    #include <Windows.h>
    
    #include <cstdio>
    #include <cmath>
    #include <ctime>
    
    LARGE_INTEGER liFreq, liStart, liStop;
    
    
    inline void StartTimer()
    {
        QueryPerformanceCounter(&liStart);
    }
    
    
    inline double ReportTimer()
    {
        QueryPerformanceCounter(&liStop);
        double milli = 1000.0 * double(liStop.QuadPart - liStart.QuadPart) / double(liFreq.QuadPart);
        printf( "%.3f ms\n", milli );
        return milli;
    }
    
    
    int main()
    {    
        QueryPerformanceFrequency(&liFreq);
    
        const size_t nTests = 10000000;
        int x = 4;
        int sumPow = 0;
        int sumShift = 0;
    
        double powTime, shiftTime;
    
        // Make an array of random exponents to use in tests.
        const size_t nExp = 10000;
        int e[nExp];
        srand( (unsigned int)time(NULL) );
        for( int i = 0; i < nExp; i++ ) e[i] = rand() % 31;
    
        // Test power.
        StartTimer();
        for( size_t i = 0; i < nTests; i++ )
        {
            int y = (int)pow(2, (double)e[i%nExp]);
            sumPow += y;
        }
        powTime = ReportTimer();
    
        // Test shifting.
        StartTimer();
        for( size_t i = 0; i < nTests; i++ )
        {
            int y = 1 << e[i%nExp];
            sumShift += y;
        }
        shiftTime = ReportTimer();
    
        // The compiler shouldn't optimize out our loops if we need to display a result.
        printf( "Sum power: %d\n", sumPow );
        printf( "Sum shift: %d\n", sumShift );
    
        printf( "Time ratio of pow versus shift: %.2f\n", powTime / shiftTime );
    
        system("pause");
        return 0;
    }
    

    My output:

    379.466 ms
    15.862 ms
    Sum power: 157650768
    Sum shift: 157650768
    Time ratio of pow versus shift: 23.92
    
    0 讨论(0)
  • 2020-12-15 21:56

    Generally yes, as bit shift is very basic operation for the processor.

    On the other hand many compilers optimise code so that raising to power is in fact just a bit shifting.

    0 讨论(0)
提交回复
热议问题