Micro fusion and addressing modes

前端 未结 4 2084
后悔当初
后悔当初 2020-11-21 06:07

I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA).

The following instruction using [base+index] addressing

4条回答
  •  天命终不由人
    2020-11-21 06:38

    I have now reviewed test results for Intel Sandy Bridge, Ivy Bridge, Haswell and Broadwell. I have not had access to test on a Skylake yet. The results are:

    • Instructions with two-register addressing and three input dependencies are fusing allright. They take only one entry in the micro-operation cache as long as they contain no more than 32 bits of data (or 2 * 16 bits).
    • It is possible to make instructions with four input dependencies, using fused multiply-and-add instructions on Haswell and Broadwell. These instructions still fuse into a single micro-op and take only one entry in the micro-op cache.
    • Instructions with more than 32 bits of data, for example 32 bits address and 8 bits immediate data can still fuse, but use two entries in the micro-operation cache (unless the 32 bits can be compressed into a 16-bit signed integer)
    • Instructions with rip-relative addressing and an immediate constant are not fusing, even if both the offset and the immediate constant are very small.
    • All the results are identical on the four machines tested.
    • The tests were performed with my own test programs using the performance monitoring counters on loops that were sufficiently small to fit into the micro-op cache.

    Your results may be due to other factors. I have not tried to use the IACA.

提交回复
热议问题