What's the most efficient way to load and extract 32 bit integer values from a 128 bit SSE vector?

↘锁芯ラ 提交于 2019-11-28 00:33:51

问题


I'm trying to optimize my code using SSE intrinsics but am running into a problem where I don't know of a good way to extract the integer values from a vector after I've done the SSE intrinsics operations to get what I want.

Does anyone know of a good way to do this? I'm programming in C and my compiler is gcc version 4.3.2.

Thanks for all your help.


回答1:


It depends on what you can assume about the minimum level of SSE support that you have.

Going all the way back to SSE2 you have _mm_extract_epi16 (PEXTRW) which can be used to extract any 16 bit element from a 128 bit vector. You would need to call this twice to get the two halves of a 32 bit element.

In more recent versions of SSE (SSE4.1 and later) you have _mm_extract_epi32 (PEXTRD) which can extract a 32 bit element in one instruction.

Alternatively if this is not inside a performance-critical loop you can just use a union, e.g.

typedef union
{
    __m128i v;
    int32_t a[4];
} U32;



回答2:


_mm_extract_epi32

The extract intrinsics is indeed the best option but if you need to support SSE2, I'd recommend this:

inline int get_x(const __m128i& vec){return _mm_cvtsi128_si32 (vec);}
inline int get_y(const __m128i& vec){return _mm_cvtsi128_si32 (_mm_shuffle_epi32(vec,0x55));}
inline int get_z(const __m128i& vec){return _mm_cvtsi128_si32 (_mm_shuffle_epi32(vec,0xAA));}
inline int get_w(const __m128i& vec){return _mm_cvtsi128_si32 (_mm_shuffle_epi32(vec,0xFF));}

I've found that if you reinterpret_cast/union the vector to any int[4] representation the compiler tends to flush things back to memory (which may not be that bad) and reads it back as an int, though I haven't looked at the assembly to see if the latest versions of the compilers generate better code.



来源:https://stackoverflow.com/questions/4360920/whats-the-most-efficient-way-to-load-and-extract-32-bit-integer-values-from-a-1

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!