Use load/store correctly

烂漫一生 提交于 2019-12-08 04:21:52

问题


How to use load/store to do aligned int16_t byte swapping correctly?

void byte_swapping(uint16_t* dest, const uint16_t* src,
                              size_t count) {
    __m128i _s, _d;
    for (uint16_t const * end(dest + count); dest != end; dest += 8, src += 8)
    {
        _s = _mm_load_si128((__m128i*)src);
        _d = _mm_or_si128(_mm_slli_epi16(_s, 8), _mm_srli_epi16(_s, 8));
        _mm_store_si128((__m128i*) dest, _d);
    }
}

回答1:


Your code will fail when count is not a multiple of 8, or when either src or dest is not 16 byte aligned.

Here is a fixed (and tested) version of your code:

void byte_swapping(uint16_t* dest, const uint16_t* src, size_t count)
{
    size_t i;
    for (i = 0; i + 8 <= count; i += 8)
    {
        __m128i s = _mm_loadu_si128((__m128i*)&src[i]);
        __m128i d = _mm_or_si128(_mm_slli_epi16(s, 8), _mm_srli_epi16(s, 8));
        _mm_storeu_si128((__m128i*)&dest[i], d);
    }
    for ( ; i < count; ++i) // handle residual elements
    {
        uint16_t w = src[i];
        w = (w >> 8) | (w << 8);
        dest[i] = w;
    }
}


来源:https://stackoverflow.com/questions/31165597/use-load-store-correctly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!