I am trying to find sum reduction of 32 elements (each 1 byte data) on an Intel i3 processor. I did this:
s=0;
for (i=0; i<32; i++)
{
s = s + a[i];
}
There is one more way to find the sum of all elements of an array using SSE instructions. The code uses the following SSE constructs.
The code works for any sized array of floats.
float sse_array_sum(float *a, int size)
{
/*
* sum += a[i] (for all i in domain)
*/
float *sse_sum, sum=0;
if(size >= 8)
{
// sse_sum[8]
posix_memalign((void **)&sse_sum, 32, 8*sizeof(float));
__m256 temp_sum;
__m256* ptr_a = (__m256*)a;
int itrs = size/8-1;
// sse_sum[0:7] = a[0:7]
temp_sum = *ptr_a;
a += 8;
ptr_a++;
for(int i=0; i
Benchmark:
size = 64000000
a[i] = 3141592.65358 for all i in domain
sequential version time: 194ms
SSE version time: 49ms
Machine specification:
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
CPU MHz: 1700.072
OS: Ubuntu