My SSE implementation of lookAt doesn't work

99封情书 提交于 2019-12-11 00:39:48

问题


So, I'm writing a math library using SSE intrinsics to use with my OpenGL application. Right now I'm implementing some of the more important functions like lookAt, using the glm library to check for correctness, but for some reason my implementation of lookAt isn't working as it should.

Here's the source code:

inline void lookAt(__m128 position, __m128 target, __m128 up)
{
    /* Get the target vector relative to the camera position */
    __m128 t = vec4::normalize3(_mm_sub_ps(target, position));
    __m128 u = vec4::normalize3(up);
    /* Get the right vector by crossing target and up. */
    __m128 r = vec4::normalize3(vec4::cross(t, u));
    /* Correct the up vector by crossing right and target. */
    u = vec4::cross(r, t);
    /* Negate the target vector. */
    t = _mm_sub_ps(_mm_setzero_ps(), t);

    /* Treat the right, up, and target vector as a matrix, and transpose it. */
    /* Conveniently, this also sets the w component of all four to 0.0f */
    _MM_TRANSPOSE4_PS(r, u, t, _mm_setr_ps(0.0f, 0.0f, 0.0f, 1.0f));

    vec4 pos = _mm_sub_ps(_mm_setzero_ps(), position);
    pos.w = 1.0f;

    /* Multiply our matrix by the transposed vectors. */
    mat4 temp;
    temp.col0 = r;
    temp.col1 = u;
    temp.col2 = t;
    temp.col3 = _mm_setr_ps(0.0f, 0.0f, 0.0f, 1.0f);

    multiply(temp);
    translate(pos);
}

My matrices are column-major, stored internally as "__m128 col0, col1, col2, col3;".

I made this after reading the man pages Here for gluLookAt. Once I realized that the right, up, and target vectors looked an awful lot like a row-major matrix, it was simple for me to transpose them so I could assign them to the rotation matrix.

The code for normalize3, in case it helps:

inline static __m128 normalize3(const __m128& vec)
{
    __m128 v = _mm_mul_ps(vec, vec);
    v = _mm_add_ps(
        _mm_add_ps(
            _mm_shuffle_ps(v, v, _MM_SHUFFLE(0, 0, 0, 0)),
            _mm_shuffle_ps(v, v, _MM_SHUFFLE(1, 1, 1, 1))),
        _mm_shuffle_ps(v, v, _MM_SHUFFLE(2, 2, 2, 2)));

    return _mm_mul_ps(vec, _mm_rsqrt_ps(v));
}

It saves a couple of calls by ignoring the w component of the vector.

What am I doing wrong?

Here's some sample output. Using position(5.0, 5.0, 0.0), target(10.0, 20.0, 55.0), and up (0.0, 1.0, 0.0), I get:

From GLM:

  • [-0.9959] [ 0.0000] [ 0.0905] [ 4.9795]
  • [-0.0237] [ 0.9650] [-0.2610] [-4.7065]
  • [-0.0874] [-0.2621] [-0.9611] [ 1.7474]
  • [ 0.0000] [ 0.0000] [ 0.0000] [ 1.0000]

From my lookAt():

  • [-0.9959] [ 0.0000] [ 0.0905] [-5.0000]
  • [-0.0237] [ 0.9651] [-0.2610] [-5.0000]
  • [-0.0874] [-0.2621] [-0.9611] [ 0.0000]
  • [ 0.0000] [ 0.0000] [ 0.0000] [ 1.0000]

It seems that the only difference is in the third column, but I'm honestly not sure which of the two is correct. I'm inclined to say that GLM's is correct, since it was designed to be identical to the glu version.

EDIT: I just discovered something interesting. If I call "translate(pos);" before calling "multiply(temp);", my resulting matrix is exactly the same as glm's. Which is correct? According to the OpenGL man page on gluLookAt, this (and thus glm) is doing it backwards. Was I doing it right before, or it correct now?


回答1:


One problem could be with _mm_rsqrt_ps(v). It's not very accurate. Replace it with _mm_div_ps(_mm_set1_ps(1.0f),_mm_sqrt_ps(v)). If that fixes the problem then you might be able to speed it up with some kind of root polishing Newton Raphson with SSE2 - can someone explain me these 3 lines

Another suggestion, you can make your function more SIMD friendly by not doing horizontal operations (which you do in your normalization function). Instead of normalizing the vectors before you transpose you can transpose first. This takes the vectors from (x,y,z,w) to (x,x,x,x), (y,y,y,y), (z,z,z,z), (w,w,w,w) - an Array of Structs (AoS) to a Struct of Arrays (SoA). Then you only need to do 1.0f/sqrt(rr+uu+t*t) to normalize.

__m128 t = _mm_sub_ps(target, position));
__m128 u = up;
__m128 r = vec4::cross(t, u);
u = vec4::cross(r, t);
t = _mm_sub_ps(_mm_setzero_ps(), t);
_MM_TRANSPOSE4_PS(r, u, t, _mm_setr_ps(0.0f, 0.0f, 0.0f, 1.0f));  //AoS to SoA

//now normalize
__m128 den = _mm_add_ps(_mm_add_ps(_mm_mul_ps(r,r),_mm_mul_ps(u,u)), _mm_mul_ps(t,t));
__m128 norm = _mm_div_ps(_mm_set1_ps(1.0f), _mm_sqrt_ps(den));
r= _mm_mul_ps(norm,r); u =_mm_mul_ps(norm,u); t = _mm_mul_ps(norm,t);

norm is not a single scalar. It contains the four different normalizations (n1,n2,n3,n4) so norm*r = (n1*x1, n2*x2, n3*x3, n4*x4). See this link for an efficient way to do matrix multiplication with SSE

Efficient 4x4 matrix vector multiplication with SSE: horizontal add and dot product - what's the point?




回答2:


I figured out the problem. My multiply function was multiplying matrices in the wrong order.



来源:https://stackoverflow.com/questions/21666736/my-sse-implementation-of-lookat-doesnt-work

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!