OpenCL reduction result wrong with large floats

血红的双手。 提交于 2019-12-01 14:41:02

There is probably no error in the coding of your kernel or host application. The issue is with the single-precision floating point.

The correct sum is: 65537 * 32768 = 2147516416, and it takes 31 bits to represent it in binary (10000000000000001000000000000000). 32-bit floats can only hold integers accurately up to 2^24.

"Any integer with absolute value less than [2^24] can be exactly represented in the single precision format" "Floating Point" article, wikipedia

This is why you are getting the correct sum when it is less than or equal to 2^24. If you are doing a complete sum using single-precision, you will eventually lose accuracy no matter which device you are executing the kernel on. There are a few things you can do to get the correct answer:

  • use double instead of float if your platform supports it
  • use int or unsigned int
  • sum a smaller set of numbers eg: 0+1+2+...+4095+4096 = (2^23 + 2^11)

Read more about single precision here.

This is a "side effect" of summing floating point numbers using finite precision CPU's or GPU's. The accuracy depends the algorithm and the order the values are summed. The theory and practice behind is explained in Nicholas J, Higham's paper

The Accuracy of Floating Point Summation

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=7AECC0D6458288CD6E4488AD63A33D5D?doi=10.1.1.43.3535&rep=rep1&type=pdf

The fix is to use a smarter algorithm like the Kahan Summation Algorithm

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

And the Higham paper has some alternatives too.

This problem illustrates the nature of benchmarking, the first rule of the benchmark is to get the right answer, using realistic data!

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!