Is hardcode float precise if it can be represented by binary format in IEEE 754?

混江龙づ霸主 提交于 2019-12-18 04:44:12

问题


for example, 0 , 0.5, 0.15625 , 1 , 2 , 3... are values converted from IEEE 754. Are their hardcode version precise?

for example:

is

float a=0;
if(a==0){
    return true;
}

always return true? other example:

float a=0.5;
float b=0.25;
float c=0.125;

is a * b always equal to 0.125 and a * b==c always true? And one more example:

int a=123;
float b=0.5;

is a * b always be 61.5? or in general, is integer multiply by IEEE 754 binary float precise?

Or a more general question: if the value is hardcode and both the value and result can be represented by binary format in IEEE 754 (e.g.:0.5 - 0.125), is the value precise?


回答1:


There is no inherent fuzzyness in floating-point numbers. It's just that some, but not all, real numbers can't be exactly represented.

Compare with a fixed-width decimal representation, let's say with three digits. The integer 1 can be represented, using 1.00, and 1/10 can be represented, using 0.10, but 1/3 can only be approximated, using 0.33.

If we instead use binary digits, the integer 1 would be represented as 1.00 (binary digits), 1/2 as 0.10, 1/4 as 0.01, but 1/3 can (again) only be approximated.

There are some things to remember, though:

  • It's not the same numbers as with decimal digits. 1/10 can be written exactly as 0.1 using decimal digits, but not using binary digits, no matter how many you use (short of infinity).
  • In practice, it is difficult to keep track of which numbers can be
    represented and which can't. 0.5 can, but 0.4 can't. So when you need exact numbers, such as (often) when working with money, you shouldn't use floating-point numbers.
  • According to some sources, some processors do strange things internally when performing floating-point calculations on numbers that can't be exactly represented, causing results to vary in a way that is, in practice, unpredictable.

(My opinion is that it's actually a reasonable first approximation to say that yes, floating-point numbers are inherently fuzzy, so unless you are sure your particular application can handle that, stay away from them.)

For more details than you probably need or want, read the famous What Every Computer Scientist Should Know About Floating-Point Arithmetic. Also, this somewhat more accessible website: The Floating-Point Guide.




回答2:


No, but as Thomas Padron-McCarthy says, some numbers can be exactly represented using binary but not all of them can.

This is the way I explain it to non-developers who I work with (like Mahmut Ali I also work on an very old financial package): Imagine having a very large cake that is cut into 256 slices. Now you can give 1 person the whole cake, 2 people half of the slices but soon as you decide to split it between 3 you can't - it's either 85 or 86 - you can't split the cake any further. The same is with floating point. You can only get exact numbers on some representations - some numbers can only be closely approximated.




回答3:


C++ does not require binary floating point representation. Built-in integers are required to have a binary representation, commonly two's complement, but one's complement and sign and magnitude are also supported. But floating point can be e.g. decimal.

This leaves open the question of whether C++ floating point can have a radix that does not have 2 as a prime factor, like 2 and 10. Are other radixes permitted? I don't know, and last time I tried to check that, I failed.

However, assuming that the radix must be 2 or 10, then all your examples involve values that are powers of 2 and therefore can be exactly represented.

This means that the single answer to most of your questions is “yes”. The exception is the question “is integer multiply by IEEE 754 binary float [exact]”. If the result exceeds the precision available, then it can't be exact, but otherwise it is.

See the classic “What Every Computer Scientist Should Know About Floating-Point Arithmetic” for background info about floating point representation & properties in general.


If a value can be exactly represented in 32-bit or 64-bit IEEE 754, then that doesn't mean that it can be exactly represented with some other floating point representation. That's because different 32-bit representations and different 64-bit representations use different number of bits to hold the mantissa and have different exponent ranges. So a number that can be exactly represented in one way, can be beyond the precision or range of some other representation.


You can use std::numeric_limits<T>::is_iec559 (where e.g. T is double) to check whether your implementation claims to be IEEE 754 compatible. However, when floating point optimizations are turned on, at least the g++ compiler (1)erroneously claims to be IEEE 754, while not treating e.g. NaN values correctly according to that standard. In effect, the is_iec559 only tells you whether the number representation is IEEE 754, and not whether the semantics conform.


(1) Essentially, instead of providing different types for different semantics, gcc and g++ try to accommodate different semantics via compiler options. And with separate compilation of parts of a program, that can't conform to the C++ standard.




回答4:


In principle, this should be possible. If you restrict yourself to exactly this class of numbers with a finite 2-power representation.

But it is dangerous: what if someone takes your code and changes your 0.5 to 0.4 or your .0625 to .065 due to whatever reasons? Then your code is broken. And no, even excessive comments won't help about that - someone will always ignore them.



来源:https://stackoverflow.com/questions/33646148/is-hardcode-float-precise-if-it-can-be-represented-by-binary-format-in-ieee-754

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!