SSE byte and half word swapping
问题 I would like to translate this code using SSE intrinsics. for (uint32_t i = 0; i < length; i += 4, src += 4, dest += 4) { uint32_t value = *(uint32_t*)src; *(uint32_t*)dest = ((value >> 16) & 0xFFFF) | (value << 16); } Is anyone aware of an intrinsic to perform the 16-bit word swapping? 回答1: pshufb (SSSE3) should be faster than 2 shifts and an OR. Also, a slight modification to the shuffle mask would enable an endian conversion, instead of just a word-swap. stealing Paul R's function