Newton Raphson with SSE2 - can someone explain me these 3 lines

前端 未结 2 1668
谎友^
谎友^ 2020-12-05 13:36

I\'m reading this document: http://software.intel.com/en-us/articles/interactive-ray-tracing

and I stumbled upon these three lines of code:

Th

2条回答
  •  青春惊慌失措
    2020-12-05 14:01

    Given the Newton iteration y_n+1=y_n(3-x(y_n)^2)/2, it should be quite straight forward to see this in the source code.

     __m128 nr   = _mm_rsqrt_ps( x );                  // The initial approximation y_0
     __m128 muls = _mm_mul_ps( _mm_mul_ps( x, nr ), nr ); // muls = x*nr*nr == x(y_n)^2
     result = _mm_mul_ps(
                   _mm_sub_ps( three, muls )    // this is 3.0 - mul;
       /*multiplied by */ __mm_mul_ps(half,nr)  // y_0 / 2 or y_0 * 0.5
     );
    

    And to be precise, this algorithm is for the inverse square root.

    Note that this still doesn't give fully a fully accurate result. rsqrtps with a NR iteration gives almost 23 bits of accuracy, vs. sqrtps's 24 bits with correct rounding for the last bit.

    The limited accuracy is an issue if you want to truncate the result to integer. (int)4.99999 is 4. Also, watch out for the x == 0.0 case if using sqrt(x) ~= x * sqrt(x), because 0 * +Inf = NaN.

提交回复
热议问题