Using AVX intrinsics instead of SSE does not improve speed — why?

前端 未结 4 887
借酒劲吻你
借酒劲吻你 2021-01-30 03:59

I\'ve been using Intel\'s SSE intrinsics for quite some time with good performance gains. Hence, I expected the AVX intrinsics to further speed-up my programs. This, unfortunate

4条回答
  •  渐次进展
    2021-01-30 04:51

    If you are interested in increasing square root performance, instead of VSQRTPS you can use VRSQRTPS and Newton-Raphson formula:

    x0 = vrsqrtps(a)
    x1 = 0.5 * x0 * (3 - (a * x0) * x0)
    

    VRSQRTPS itself doesn't benefit from AVX, but other calculations do.

    Use it if 23 bits of precision is enough for you.

提交回复
热议问题