Horizontal xor of two SSE values

邮差的信 提交于 2019-12-23 22:19:37

问题


I would need to do horizontal xor of two 128bit integers (by 32bit integers) and combine the results to one 64bit integer. So operation like this:

uint32_t x0[4];
uint32_t x1[4];

uint32_t xor0 = x0[0];
uint32_t xor1 = x1[0];
for (int i = 1; i < 4; ++i) {
    xor0 ^= x0[i];
    xor1 ^= x1[i];
}
uint64_t xor = uint64_t(xor1) << 32 | xor0;

I finally found following code, that seems to work:

__m128i x0 = ...;
__m128i x1 = ...;

__m128i xor64_0 = _mm_unpackhi_epi64(x0, x1);
__m128i xor64_1 = _mm_unpacklo_epi64(x0, x1);

__m128i xor64 = _mm_xor_si128(xor64_0, xor64_1);
__m128i xor32_0 = _mm_shuffle_epi32(xor64, _MM_SHUFFLE(3, 1, 2, 0));
__m128i xor32_1 = _mm_shuffle_epi32(xor64, _MM_SHUFFLE(2, 0, 3, 1));
__m128i xor32 = _mm_xor_si128(xor32_0, xor32_1);

uint64_t xor = _mm_cvtsi128_si64(xor32);

Is this the fastest possible implementation? Would it make sense to combine integer and floating-point operations, like _mm_movehdup_ps(.) ?

来源:https://stackoverflow.com/questions/42040937/horizontal-xor-of-two-sse-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!