What is the point of SSE2 instructions such as orpd?

我只是一个虾纸丫 提交于 2020-07-30 06:04:04

问题


The orpd instruction is a "bitwise logical OR of packed double precision floating point values". Doesn't this do exactly the same thing as por ("bitwise logical OR")? If so, what's the point of having it?


回答1:


Remember that SSE1 orps came first. (Well actually MMX por mm, mm/mem came even before SSE1.)

Having the same opcode with a new prefix be the SSE2 orpd instruction makes sense for hardware decoder logic, I guess, just like movapd vs. movaps. Several instructions like this are redundant between between ps and pd versions, but some aren't, like addps vs. addpd or unpcklps vs. unpcklpd being different shuffles.

The reason for SSE2 also introducing 66 0F EB /r por xmm,xmm/mem is at least partly for consistency with MMX 0F EB /r por mm, mm/mem, again same opcode with a new mandatory prefix. Just like paddb mm, mm vs. paddb xmm, xmm.

But also for the possibility of different bypass-forwarding domains for vec-integer vs. FP. Different microarchitectures have had different behaviours for how they actually decoded and ran those different instructions. Some ran all the XMM or instructions the same way, creating extra latency for forwarding between FP and simd-integer domains.

No CPUs have ever actually had different fowarding domains for FP-float vs. FP-double, so yes, movapd and orpd are in practice useless wastes of space that you should never use. Use the smaller orps encoding instead.

(Or with VEX encoding it doesn't matter; vorps and vorpd are the same size: 2 byte prefix + opcode + modrm ...)


por vs. orps

For more about bypass delay when using por between FP math instructions like addps, or orps between SIMD-integer insns like paddb, see

  • Do I get a performance penalty when mixing SSE integer/float SIMD instructions
  • What's the difference between logical SSE intrinsics?
  • Difference between the AVX instructions vxorpd and vpxor
  • Does using mix of pxor and xorps affect performance?
  • Is there any situation where using MOVDQU and MOVUPD is better than MOVUPS?
  • Choosing SSE instruction execution domains in mixed contexts - pre-Skylake, integer versions have better throughput.

And in case anyone was wondering, the answer to the other interpretation of the title: bitwise booleans on FP values are mostly used to set, clear, or toggle the sign bit. Or to do stuff with cmpps/pd masks like blending.



来源:https://stackoverflow.com/questions/62111946/what-is-the-point-of-sse2-instructions-such-as-orpd

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!