What's the most efficient way to load and extract 32 bit integer values from a 128 bit SSE vector?

前端 未结 2 1242
一向
一向 2020-12-18 03:38

I\'m trying to optimize my code using SSE intrinsics but am running into a problem where I don\'t know of a good way to extract the integer values from a vector after I\'ve

2条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-12-18 04:18

    It depends on what you can assume about the minimum level of SSE support that you have.

    Going all the way back to SSE2 you have _mm_extract_epi16 (PEXTRW) which can be used to extract any 16 bit element from a 128 bit vector. You would need to call this twice to get the two halves of a 32 bit element.

    In more recent versions of SSE (SSE4.1 and later) you have _mm_extract_epi32 (PEXTRD) which can extract a 32 bit element in one instruction.

    Alternatively if this is not inside a performance-critical loop you can just use a union, e.g.

    typedef union
    {
        __m128i v;
        int32_t a[4];
    } U32;
    

提交回复
热议问题