How can I find the micro-ops which instructions on Intel's x86 CPUs decode to?

后端 未结 4 682
慢半拍i
慢半拍i 2021-02-01 20:53

The Intel Optimization Reference, under Section 3.5.1, advises:

\"Favor single-micro-operation instructions.\"

\"Avoid using complex instructions (for example, e

4条回答
  •  自闭症患者
    2021-02-01 21:12

    In addition to the resources already mentioned in the other answers (Agner Fog's tables and IACA), you can find detailed information on the μops of most x86 instructions on recent Intel CPUs (from Nehalem to Cannon Lake) on our website uops.info. The website also contains information on the latency and throughput of each instruction. The data was obtained by running automatically generated microbenchmarks both on the actual hardware (using hardware performance counters) and on top of different versions of Intel IACA.

    Compared to Agner Fog's instruction tables, the data on uops.info is in several cases more accurate and precise. As an example, consider the PBLENDVB instruction on Nehalem. According to Agner Fog's tables, the instruction has one μop that can only use port 0, and one μop that can only use port 5. This is probably based on the observation that when executing the instruction repeatedly in isolation, there is, on average, one μop on port 0, and one μop on port 5. The microbenchmarks on uops.info show that actually both μops can use port 0 and port 5. This is determined by executing the instruction together with instructions that can only use port 0 or port 5.

    The data on uops.info also reveals several inaccuracies in Intel's IACA. For example, on Skylake both μops of the CVTPI2PS XMM, MM instruction can only use port 0 in IACA (http://uops.info/html-ports/SKL/CVTPI2PS_XMM_MM-IACA3.0.html). On the actual hardware, there is one μop that can only use port 0, and one μop that can use both port 0 and port 1. Agner Fog also observed that one μop of this instruction can use port 1; however, he claims that this μop can only use port 1, which is incorrect.

提交回复
热议问题