Sum reduction of unsigned bytes without overflow, using SSE2 on Intel

后端 未结 3 2002
-上瘾入骨i
-上瘾入骨i 2020-11-30 15:15

I am trying to find sum reduction of 32 elements (each 1 byte data) on an Intel i3 processor. I did this:

s=0; 
for (i=0; i<32; i++)
{
    s = s + a[i];
}         


        
3条回答
  •  南笙
    南笙 (楼主)
    2020-11-30 15:30

    There is one more way to find the sum of all elements of an array using SSE instructions. The code uses the following SSE constructs.

    • __m256 register
    • _mm256_store_ps(float *a, __m256 b)
    • _mm256_add_ps(__m256 a, __m256 b)

    The code works for any sized array of floats.

    float sse_array_sum(float *a, int size)
    {
        /*
         *   sum += a[i] (for all i in domain)
         */
    
        float *sse_sum, sum=0;
        if(size >= 8)
        {
            // sse_sum[8]
            posix_memalign((void **)&sse_sum, 32, 8*sizeof(float));
    
            __m256 temp_sum;
            __m256* ptr_a = (__m256*)a;
            int itrs = size/8-1;
    
            // sse_sum[0:7] = a[0:7]
            temp_sum = *ptr_a;
            a += 8;
            ptr_a++;
    
            for(int i=0; i

    Benchmark:

    size = 64000000
    a[i] = 3141592.65358 for all i in domain

    sequential version time: 194ms
    SSE version time: 49ms

    Machine specification:

    Thread(s) per core: 2
    Core(s) per socket: 2
    Socket(s): 1
    CPU MHz: 1700.072
    OS: Ubuntu

提交回复
热议问题