How to absolute 2 double or 4 floats using SSE instruction set? (Up to SSE4)

前端未结

关注

 3  2076

栀梦 2020-12-29 07:17

Here\'s the sample C code that I am trying to accelerate using SSE, the two arrays are 3072 element long with doubles, may drop it down to float if i don\'t need the precisi

3条回答

滥情空心 (楼主)

2020-12-29 08:00

Probably the easiest way is as follows:

__m128d vsum = _mm_set1_pd(0.0);        // init partial sums
for (k = 0; k < 3072; k += 2)
{
    __m128d va = _mm_load_pd(&sima[k]); // load 2 doubles from sima, simb
    __m128d vb = _mm_load_pd(&simb[k]);
    __m128d vdiff = _mm_sub_pd(va, vb); // calc diff = sima - simb
    __m128d vnegdiff = mm_sub_pd(_mm_set1_pd(0.0), vdiff); // calc neg diff = 0.0 - diff
    __m128d vabsdiff = _mm_max_pd(vdiff, vnegdiff);        // calc abs diff = max(diff, - diff)
    vsum = _mm_add_pd(vsum, vabsdiff);  // accumulate two partial sums
}

Note that this may not be any faster than scalar code on modern x86 CPUs, which typically have two FPUs anyway. However if you can drop down to single precision then you may well get a 2x throughput improvement.

Note also that you will need to combine the two partial sums in vsum into a scalar value after the loop, but this is fairly trivial to do and is not performance-critical.

0 讨论(0)

查看其它3个回答