fast-math

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

阅读更多关于 Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

Does anyone know why GCC/Clang will not optimist function test1 in the below code sample to simply use just the RCPPS instruction when using the fast-math option? Is there another compiler flag that would generate this code? typedef float float4 __attribute__((vector_size(16))); float4 test1(float4 v) { return 1.0f / v; } You can see the compiled output here: https://goo.gl/jXsqat Because the precision of RCPPS is a lot lower than float division. An option to enable that optimization would not be appropriate as part of -ffast-math . The x86 target options of the gcc manual says there in fact

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

阅读更多关于 Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

问题 Does anyone know why GCC/Clang will not optimist function test1 in the below code sample to simply use just the RCPPS instruction when using the fast-math option? Is there another compiler flag that would generate this code? typedef float float4 __attribute__((vector_size(16))); float4 test1(float4 v) { return 1.0f / v; } You can see the compiled output here: https://goo.gl/jXsqat 回答1: Because the precision of RCPPS is a lot lower than float division. An option to enable that optimization

Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

阅读更多关于 Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

问题 I would like to know if any code in C or C++ using floating point arithmetic would produce bit exact results in any x86 based architecture, regardless of the complexity of the code. To my knowledge, any x86 architecture since the Intel 8087 uses a FPU unit prepared to handle IEEE-754 floating point numbers, and I cannot see any reason why the result would be different in different architectures. However, if they were different (namely due to different compiler or different optimization level)

What does gcc's ffast-math actually do?

阅读更多关于 What does gcc's ffast-math actually do?

I understand gcc's --ffast-math flag can greatly increase speed for float ops, and goes outside of IEEE standards, but I can't seem to find information on what is really happening when it's on. Can anyone please explain some of the details and maybe give a clear example of how something would change if the flag was on or off? I did try digging through S.O. for similar questions but couldn't find anything explaining the workings of ffast-math. Mysticial As you mentioned, it allows optimizations that do not preserve strict IEEE compliance. An example is this: x = x*x*x*x*x*x*x*x; to x *= x; x *=

Why doesn't GCC optimize aaaaaa to (aaa)(aaa)?

阅读更多关于 Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

问题 I am doing some numerical optimization on a scientific application. One thing I noticed is that GCC will optimize the call pow(a,2) by compiling it into a*a , but the call pow(a,6) is not optimized and will actually call the library function pow , which greatly slows down the performance. (In contrast, Intel C++ Compiler, executable icc , will eliminate the library call for pow(a,6) .) What I am curious about is that when I replaced pow(a,6) with a*a*a*a*a*a using GCC 4.5.1 and options \" -O3

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

Why does GCC or Clang not optimise reciprocal to 1 instruction when using fast-math

Does any floating point-intensive code produce bit-exact results in any x86-based architecture?

What does gcc&#39;s ffast-math actually do?

Why doesn&#39;t GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

What does gcc's ffast-math actually do?

Why doesn't GCC optimize aaaaaa to (aaa)(aaa)?