Is integer multiplication really done at the same speed as addition on a modern CPU?

后端 未结 11 760
梦谈多话
梦谈多话 2020-12-23 13:31

I hear this statement quite often, that multiplication on modern hardware is so optimized that it actually is at the same speed as addition. Is that true?

I never ca

11条回答
  •  醉话见心
    2020-12-23 14:23

    Even on ARM (known for its high efficiency and small, clean design), integer multiplications take 3-7 cycles and than integer additions take 1 cycle.

    However, an add/shift trick is often used to multiply integers by constants faster than the multiply instruction can calculate the answer.

    The reason this works well on ARM is that ARM has a "barrel shifter", which allows many instructions to shift or rotate one of their arguments by 1-31 bits at zero cost, i.e. x = a + b and x = a + (b << s) take exactly the same amount of time.

    Utilizing this processor feature, let's say you want to calculate a * 15. Then since 15 = 1111 (base 2), the following pseudocode (translated into ARM assembly) would implement the multiplication:

    a_times_3 = a + (a << 1)                  // a * (0011 (base 2))
    a_times_15 = a_times_3 + (a_times_3 << 2) // a * (0011 (base 2) + 1100 (base 2))
    

    Similarly you could multiply by 13 = 1101 (base 2) using either of the following:

    a_times_5 = a + (a << 2)
    a_times_13 = a_times_5 + (a << 3)
    
    a_times_3 = a + (a << 1)
    a_times_15 = a_times_3 + (a_times_3 << 2)
    a_times_13 = a_times_15 - (a << 1)
    

    The first snippet is obviously faster in this case, but sometimes subtraction helps when translating a constant multiplication into add/shift combinations.

    This multiplication trick was used heavily in the ARM assembly coding community in the late 80s, on the Acorn Archimedes and Acorn RISC PC (the origin of the ARM processor). Back then, a lot of ARM assembly was written by hand, since squeezing every last cycle out of the processor was important. Coders in the ARM demoscene developed many techniques like this for speeding up code, most of which are probably lost to history now that almost no assembly code is written by hand anymore. Compilers probably incorporate many tricks like this, but I'm sure there are many more that never made the transition from "black art optimization" to compiler implementation.

    You can of course write explicit add/shift multiplication code like this in any compiled language, and the code may or may not run faster than a straight multiplication once compiled.

    x86_64 may also benefit from this multiplication trick for small constants, although I don't believe shifting is zero-cost on the x86_64 ISA, in either the Intel or AMD implementations (x86_64 probably takes one extra cycle for each integer shift or rotate).

提交回复
热议问题