OpenCL Floating point precision

南楼画角 提交于 2019-11-30 10:04:41

Since nothing changes in the GPU code as you switch your project from x86 to x64, it all has to do as how multiplication is performed on the CPU. There are some subtle differences between floating-point numbers handling in x86 and x64 modes and the biggest one is that since any x64 CPU also supports SSE and SSE2, it is used by default for math operations in 64-bit mode on Windows.

The HD4770 GPU does all computations using single-precision floating point units. Modern x64 CPUs on the other hand have two kinds of functional units that handle floating point numbers:

  • x87 FPU which operates with the much higher extended precision of 80 bits
  • SSE FPU which operates with 32-bit and 64-bit precision and is much compatible with how other CPUs handle floating point numbers

In 32-bit mode the compiler does not assume that SSE is available and generates usual x87 FPU code to do the math. In this case operations like data[i] * data[i] are performed internally using the much higher 80-bit precision. Comparison of the kind if (results[i] == data[i] * data[i]) is performed as follows:

  • data[i] is pushed onto the x87 FPU stack using the FLD DWORD PTR data[i]
  • data[i] * data[i] is computed using FMUL DWORD PTR data[i]
  • result[i] is pushed onto the x87 FPU stack using FLD DWORD PTR result[i]
  • both values are compared using FUCOMPP

Here comes the problem. data[i] * data[i] resides in an x87 FPU stack element in 80-bit precision. result[i] comes from the GPU in 32-bit precision. Both numbers will most likely differ since data[i] * data[i] has much more significant digits whereas result[i] has lots of zeros (in 80-bit precision)!

In 64-bit mode things happen in another way. The compiler knows that your CPU is SSE capable and it uses SSE instructions to do the math. The same comparison statement is performed in the following way on x64:

  • data[i] is loaded into an SSE register using MOVSS XMM0, DWORD PTR data[i]
  • data[i] * data[i] is computed using MULSS XMM0, DWORD PTR data[i]
  • result[i] is loaded into another SSE register using MOVSS XMM1, DWORD PTR result[i]
  • both values are compared using UCOMISS XMM1, XMM0

In this case the square operation is performed with the same 32-bit single point precision as is used on the GPU. No intermediate results with 80-bit precision are generated. That's why results are the same.

It is very easy to actually test this even without GPU being involved. Just run the following simple program:

#include <stdlib.h>
#include <stdio.h>

float mysqr(float f)
{
    f *= f;
    return f;
}

int main (void)
{
    int i, n;
    float f, f2;

    srand(1);
    for (i = n = 0; n < 1000000; n++)
    {
        f = rand()/(float)RAND_MAX;
        if (mysqr(f) != f*f) i++;
    }
    printf("%d of %d squares differ\n", i);
    return 0;
}

mysqr is specifically written so that the intermediate 80-bit result will get converted in 32-bit precision float. If you compile and run in 64-bit mode, output is:

0 of 1000000 squares differ

If you compile and run in 32-bit mode, output is:

999845 of 1000000 squares differ

In principle you should be able to change the floating point model in 32-bit mode (Project properties -> Configuration Properties -> C/C++ -> Code Generation -> Floating Point Model) but doing so changes nothing since at least on VS2010 intermediate results are still kept in the FPU. What you can do is to enforce store and reload of the computed square so that it will be rounded to 32-bit precision before it is compared with the result from the GPU. In the simple example above this is achieved by changing:

if (mysqr(f) != f*f) i++;

to

if (mysqr(f) != (float)(f*f)) i++;

After the change 32-bit code output becomes:

0 of 1000000 squares differ

In my case

(float)(f*f)

didn't help. I used

  correct = 0;
  for(unsigned int i = 0; i < count; i++) {
    volatile float sqr = data[i] * data[i];
    if(results[i] == sqr)
      correct++;
  }

instead.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!