How to control whether C math uses SSE2?

旧巷老猫 提交于 2019-12-03 22:23:53

I found the answer through careful investigation of math.h. This is controlled by a method called _set_SSE2_enable. This is a public symbol documented here:

Enables or disables the use of Streaming SIMD Extensions 2 (SSE2) instructions in CRT math routines. (This function is not available on x64 architectures because SSE2 is enabled by default.)

This causes the aforementionned ___use_sse2_mathfcns flag to be set to the provided value, effectively enabling or disabling use of the _pentium4 SSE2 routines.

The documentation mentions this affects only certain transcendental functions, but looking at the disassembly, this seems to affect everyone of them.

Edit: stepping into every function reveals that they're all available in SSE2 except for the following:

  • fmod
  • sinh
  • cosh
  • tanh
  • sqrt

Sqrt is the biggest offender, but it's trivial to implement in SSE2 using intrinsics. For the others, there's no simple solution except perhaps using a third-party library, but I can probably do without.

Why not use your own library instead of the C runtime? This would provide an even stronger guarantee of consistency across computers (presumably the C runtime is provided as a DLL and might change slightly in time).

I would recommend CRlibm. If you are already targeting SSE2, and as long as you did not intend to change the FPU's rounding mode, you are in the ideal conditions to use it, and you won't find a more accurate implementation.

The short answer is that you can't tell IN YOUR CODE for certain what the library will do, unless you are also involving library-implementation specific details. These would make the code completely unportable - even two different builds of the same compiler may change the internals of the library.

Of course, if portability isn't an issue, then using extern <type> ___use_sse2_mathfcns; and checking if it's the true would clearly work.

I expect that if the processor has SSE2 and you are using a modern enough library, it would use SSE2 wherever possible. But to say that for certain is a different matter.

If this is critical for your code, then implement your own transcendental functions and use those - that's the only way to guarantee the same result. Or, use some suitable inline assembler (or transcendental) code to calculate selected sin, cos, etc values, and compare those with the sin() and cos() functions provided by the library.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!