How to absolute 2 double or 4 floats using SSE instruction set? (Up to SSE4)

前端 未结 3 2076
栀梦
栀梦 2020-12-29 07:17

Here\'s the sample C code that I am trying to accelerate using SSE, the two arrays are 3072 element long with doubles, may drop it down to float if i don\'t need the precisi

3条回答
  •  滥情空心
    2020-12-29 08:00

    Probably the easiest way is as follows:

    __m128d vsum = _mm_set1_pd(0.0);        // init partial sums
    for (k = 0; k < 3072; k += 2)
    {
        __m128d va = _mm_load_pd(&sima[k]); // load 2 doubles from sima, simb
        __m128d vb = _mm_load_pd(&simb[k]);
        __m128d vdiff = _mm_sub_pd(va, vb); // calc diff = sima - simb
        __m128d vnegdiff = mm_sub_pd(_mm_set1_pd(0.0), vdiff); // calc neg diff = 0.0 - diff
        __m128d vabsdiff = _mm_max_pd(vdiff, vnegdiff);        // calc abs diff = max(diff, - diff)
        vsum = _mm_add_pd(vsum, vabsdiff);  // accumulate two partial sums
    }
    

    Note that this may not be any faster than scalar code on modern x86 CPUs, which typically have two FPUs anyway. However if you can drop down to single precision then you may well get a 2x throughput improvement.

    Note also that you will need to combine the two partial sums in vsum into a scalar value after the loop, but this is fairly trivial to do and is not performance-critical.

提交回复
热议问题