gcc 4.8 AVX optimization bug: extra code insertion?

末鹿安然 提交于 2019-12-02 00:06:56

I think what you are seeing in the generated code is an additional iteration of Newton-Raphson to refine the initial estimate provided by vrcpps. (See: the Intel Intrinsics Guide for details of the accuracy of the initial estimate provided by vrcpps.)

I have figured out why. All AVX/SIMD/SSE approximation instructions need at least one Newton-Rhapson iteration to restore accuracy, otherwise, it loses 50% accuracy, i.e., the original FLOAT32 has an accuracy up to 23-bits. Without any Newton-Rhapson, we are left with only 11-bits accuracy. That approximation is way too rough to be directly usable.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!