Most efficient way to check if all __m128i components are 0 [using <= SSE4.1 intrinsics]

試著忘記壹切 提交于 2019-11-28 00:58:59

You can use the PTEST instuction via the _mm_testz_si128 intrinsic (SSE4.1), like this:

#include "smmintrin.h" // SSE4.1 header

if (!_mm_testz_si128(xor, xor))
{
    // rectangle has changed
}

Note that _mm_testz_si128 returns 1 if the bitwise AND of the two arguments is zero.

Ironically, ptest instruction from SSE 4.1 may be slower than pmovmskb from SSE2 in some cases. I suggest using simply:

__m128i cmp = _mm_cmpeq_epi32(oldRect, newRect);
if (_mm_movemask_epi8(cmp) != 0xFFFF)
  //registers are different

Note that if you really need that xor value, you'll have to compute it separately.

For Intel processors like Ivy Bridge, the version by PaulR with xor and _mm_testz_si128 translates into 4 uops, while suggested version without computing xor translates into 3 uops (see also this thread). This may result in better throughput of my version.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!