SIMD the following code

青春壹個敷衍的年華 提交于 2019-12-03 11:17:19

Here's a fairly straightforward implementation (warning: untested code):

int32_t sum_array(const int32_t a[], const int n)
{
    __m128i vsum = _mm_set1_epi32(0);       // initialise vector of four partial 32 bit sums
    int32_t sum;
    int i;

    for (i = 0; i < n; i += 4)
    {
        __m128i v = _mm_load_si128(&a[i]);  // load vector of 4 x 32 bit values
        vsum = _mm_add_epi32(vsum, v);      // accumulate to 32 bit partial sum vector
    }
    // horizontal add of four 32 bit partial sums and return result
    vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 8));
    vsum = _mm_add_epi32(vsum, _mm_srli_si128(vsum, 4));
    sum = _mm_cvtsi128_si32(vsum);
    return sum;
}

Note that the input array, a[], needs to be 16 byte aligned, and n should be a multiple of 4.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!