Does using mix of pxor and xorps affect performance?
问题 I've come across a fast CRC computation using PCLMULQDQ implementation. I see, that guys mix pxor and xorps instructions heavily like in the fragment below: movdqa xmm10, [rk9] movdqa xmm8, xmm0 pclmulqdq xmm0, xmm10, 0x11 pclmulqdq xmm8, xmm10, 0x0 pxor xmm7, xmm8 xorps xmm7, xmm0 movdqa xmm10, [rk11] movdqa xmm8, xmm1 pclmulqdq xmm1, xmm10, 0x11 pclmulqdq xmm8, xmm10, 0x0 pxor xmm7, xmm8 xorps xmm7, xmm1 Is there any practical reason for this? Performance boost? If yes, then what lies