Modulo 2*Pi using SSE/SSE2

坚强是说给别人听的谎言 提交于 2019-12-10 18:09:41

问题


I'm still pretty new to using SSE and am trying to implement a modulo of 2*Pi for double-precision inputs of the order 1e8 (the result of which will be fed into some vectorised trig calculations).

My current attempt at the code is based around the idea that mod(x, 2*Pi) = x - floor(x/(2*Pi))*2*Pi and looks like:

#define _PD_CONST(Name, Val)                                            \
static const double _pd_##Name[2] __attribute__((aligned(16))) = { Val, Val }  

_PD_CONST(2Pi, 6.283185307179586);  /* = 2*pi  */  
_PD_CONST(recip_2Pi, 0.159154943091895); /* = 1/(2*pi)  */

void vec_mod_2pi(const double * vec, int Size, double * modAns)
{
    __m128d sse_a, sse_b, sse_c;
    int i;
    int k = 0;
    double t = 0;

    unsigned int initial_mode;
    initial_mode = _MM_GET_ROUNDING_MODE();

    _MM_SET_ROUNDING_MODE(_MM_ROUND_DOWN);

    for (i = 0; i < Size; i += 2)
    {
        sse_a = _mm_loadu_pd(vec+i);
        sse_b = _mm_mul_pd( _mm_cvtepi32_pd( _mm_cvtpd_epi32( _mm_mul_pd(sse_a, *(__m128d*)_pd_recip_2Pi) ) ), *(__m128d*)_pd_2Pi);
        sse_c = _mm_sub_pd(sse_a, sse_b);
        _mm_storeu_pd(modAns+i,sse_c);
    }

    k = i-2;
    for (i = 0; i < Size%2; i++)
    {
        t = (double)((int)(vec[k+i] * 0.159154943091895)) * 6.283185307179586;
        modAns[k+i] = vec[k+i] - t;
    }

    _MM_SET_ROUNDING_MODE(initial_mode);
}

Unfortunately, this is currently returning a lot of NaN with a couple of answers of 1.128e119 as well (some what outside the range of 0 -> 2*Pi that I was aiming for!). I suspect that where I'm going wrong is in the double-to-int-to-double conversion that I'm trying to use to do the floor.

Can anyone suggest where I've gone wrong and how to improve it?

P.S. sorry about the format of that code, it's the first time I've posted a question on here and can't seem to get it to give me empty lines within the code block to make it readable.


回答1:


If you want any kind of accuracy, the simple algorithm is terribly bad. For an accurate range reduction algorithm, see e.g. Ng et al., ARGUMENT REDUCTION FOR HUGE ARGUMENTS: Good to the Last Bit (now available via the Wayback Machine: 2012-12-24)




回答2:


For large arguments Hayne-Panek algorithm is typically used. However, the Hayne-Panek paper is quite difficult to read, and I suggest to have a look at Chapter 11 in the Handbook of Floating-Point Arithmetic for a more accessible explanation.



来源:https://stackoverflow.com/questions/11576202/modulo-2pi-using-sse-sse2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!