How can I find the micro-ops which instructions on Intel's x86 CPUs decode to?

后端未结

关注

 4  687

慢半拍i 2021-02-01 20:53

The Intel Optimization Reference, under Section 3.5.1, advises:

\"Favor single-micro-operation instructions.\"

\"Avoid using complex instructions (for example, e

4条回答

渐次进展 (楼主)

2021-02-01 21:18

Agner Fog's insn tables show which port micro-ops run on, which is all that matters for performance. It doesn't show exactly what each uop does, because that's not something you can reverse-engineer. (i.e. which execution unit it uses on that port).

It's easy to guess in some cases, though: haddps on Haswell is 1 uop for port, and 2 uops for port 5. That's pretty obviously 2 shuffles (port 5) and an FP-add (port 1). There are lots of other execution units on port 5, e.g. vector boolean, SIMD integer add, and lots of scalar integer stuff, but given that haddps needs multiple uops at all, it's pretty obvious that Intel implements it with shuffles and a regular "vertical" add uop.

It might be possible to figure out something about the dependency relationship between those uops (e.g. is it 2 shufps-style shuffles feeding an FP add, or is it shuffle-add-shuffle?). We also aren't sure whether the shuffles are independent of each other or not: Haswell only has one shuffle port, so the resource conflict would give us 5c total latency because the shuffles couldn't run in parallel even if they were independent.

Both shuffle uops probably need both inputs, so even if they're independent of each other, having one input ready sooner than the other doesn't improve the latency for the critical-path (from the slower input to the output).

If it was possible to implement HADDPS with 2 independent one-input shuffles, that would mean that HADDPS xmm0, xmm1 in a loop where xmm1 was a constant would only add 4c of latency to the dep chain involving xmm0. I haven't measured, but I think it's unlikely; almost certainly it's two independent 2-input shuffles to feeding an ADDPS uop.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...