How to vectorize a distance calculation using SSE2

前端 未结 2 971
再見小時候
再見小時候 2021-01-03 09:00

A and B are vectors or length N, where N could be in the range 20 to 200 say. I want to calculate the square of the distance between these vectors, i.e. d^2 = ||A-B||^2.

2条回答
  •  青春惊慌失措
    2021-01-03 09:12

    It's pretty straightforward to implement this using SSE intrinsics:

    #include "pmmintrin.h"
    
    __m128 vd2 = _mm_set1_ps(0.0f);
    float d2 = 0.0f;
    int k;
    
    // process 4 elements per iteration
    for (k = 0; k < N - 3; k += 4)
    {
        __m128 va = _mm_loadu_ps(&a[k]);
        __m128 vb = _mm_loadu_ps(&b[k]);
        __m128 vd = _mm_sub_ps(va, vb);
        vd = _mm_mul_ps(vd, vd);
        vd2 = _mm_add_ps(vd2, vd);
    }
    
    // horizontal sum of 4 partial dot products
    vd2 = _mm_hadd_ps(vd2, vd2);
    vd2 = _mm_hadd_ps(vd2, vd2);
    _mm_store_ss(&d2, vd2);
    
    // clean up any remaining elements
    for ( ; k < N; ++k)
    {
        float d = a[k] - b[k];
        d2 += d * d;
    }
    

    Note that if you can guarantee that a and b are 16 byte aligned then you can use _mm_load_ps rather than _mm_loadu_ps which may help performance, particularly on older (pre Nehalem) CPUs.

    Note also that for loops such as this where the are very few arithmetic instructions relative to the number of loads then performance may well be limited by memory bandwidth and the expected speed-up from vectorization may not be realised in practice.

提交回复
热议问题